Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Biomedical Named Entity Recognition (BioNER) aims at identifying biomedical entities such as genes, proteins, diseases, and chemical compounds in the given textual data. However, due to the issues of ethics, privacy, and high specialization of biomedical data, BioNER suffers from the more severe problem of lacking in quality labeled data than the general domain especially for the token-level. Facing the extremely limited labeled biomedical data, this work studies the problem of gazetteer-based BioNER, which aims at building a BioNER system from scratch. It needs to identify the entities in the given sentences when we have zero token-level annotations for training. Previous works usually use sequential labeling models to solve the NER or BioNER task and obtain weakly labeled data from gazetteers when we don't have full annotations. However, these labeled data are quite noisy since we need the labels for each token and the entity coverage of the gazetteers is limited. Here we propose to formulate the BioNER task as a Textual Entailment problem and solve the task via Textual Entailment with Dynamic Contrastive learning (TEDC). TEDC not only alleviates the noisy labeling issue, but also transfers the knowledge from pre-trained textual entailment models. Additionally, the dynamic contrastive learning framework contrasts the entities and non-entities in the same sentence and improves the model's discrimination ability. Experiments on two real-world biomedical datasets show that TEDC can achieve state-of-the-art performance for gazetteer-based BioNER.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2023.3236477DOI Listing

Publication Analysis

Top Keywords

textual entailment
16
labeled data
12
biomedical named
8
named entity
8
entity recognition
8
bioner aims
8
biomedical data
8
gazetteer-based bioner
8
bioner task
8
task textual
8

Similar Publications

Background: The presence of stigmatizing language within electronic health records (EHRs) poses significant risks to patient care by perpetuating biases. While numerous studies have explored the use of supervised machine learning models to detect stigmatizing language automatically, these models require large, annotated datasets, which may not always be readily available. In-context learning (ICL) has emerged as a data-efficient alternative, allowing large language models to adapt to tasks using only instructions and examples.

View Article and Find Full Text PDF

Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model.

View Article and Find Full Text PDF

Given the visual-semantic hierarchy between images and texts, hyperbolic embeddings have been employed for visual-semantic representation learning, leveraging the advantages of hierarchy modeling in hyperbolic space. This approach demonstrates notable advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, textual data in the medical domain often comprises complex sentences describing various conditions or diseases, posing challenges for vision language models to comprehend free-text medical reports.

View Article and Find Full Text PDF
Article Synopsis
  • The goal of relationship classification (RC) is to identify the semantic relationship between entities in sentences, but current approaches mostly rely on predefined relationships, making it hard to recognize new ones, a challenge known as zero-shot relationship classification (ZSRC).
  • Existing ZSRC methods struggle with autonomy and often require manual definitions, so researchers propose a new framework called inference on category attributes (ICA) to improve how models understand unseen relationships.
  • The ICA framework uses hypothesis templates based on relationship descriptions to convert RC data into a textual entailment format, enhancing a model's ability to generalize knowledge to new classes, and has shown strong performance on benchmark datasets like FewRel and Wiki-ZSL.
View Article and Find Full Text PDF

Background: The regulatory affairs (RA) division in a pharmaceutical establishment is the point of contact between regulatory authorities and pharmaceutical companies. They are delegated the crucial and strenuous task of extracting and summarizing relevant information in the most meticulous manner from various search systems. An artificial intelligence (AI)-based intelligent search system that can significantly bring down the manual efforts in the existing processes of the RA department while maintaining and improving the quality of final outcomes is desirable.

View Article and Find Full Text PDF