Transferring From Textual Entailment to Biomedical Named Entity Recognition.

Tingting Liang , Congying Xia , Ziqiang Zhao , Yixuan Jiang , Yuyu Yin , Philip S Yu

IEEE/ACM Trans Comput Biol Bioinform

Published: November 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Biomedical Named Entity Recognition (BioNER) aims at identifying biomedical entities such as genes, proteins, diseases, and chemical compounds in the given textual data. However, due to the issues of ethics, privacy, and high specialization of biomedical data, BioNER suffers from the more severe problem of lacking in quality labeled data than the general domain especially for the token-level. Facing the extremely limited labeled biomedical data, this work studies the problem of gazetteer-based BioNER, which aims at building a BioNER system from scratch. It needs to identify the entities in the given sentences when we have zero token-level annotations for training. Previous works usually use sequential labeling models to solve the NER or BioNER task and obtain weakly labeled data from gazetteers when we don't have full annotations. However, these labeled data are quite noisy since we need the labels for each token and the entity coverage of the gazetteers is limited. Here we propose to formulate the BioNER task as a Textual Entailment problem and solve the task via Textual Entailment with Dynamic Contrastive learning (TEDC). TEDC not only alleviates the noisy labeling issue, but also transfers the knowledge from pre-trained textual entailment models. Additionally, the dynamic contrastive learning framework contrasts the entities and non-entities in the same sentence and improves the model's discrimination ability. Experiments on two real-world biomedical datasets show that TEDC can achieve state-of-the-art performance for gazetteer-based BioNER.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2023.3236477	DOI Listing

Publication Analysis

Top Keywords

textual entailment

labeled data

biomedical named

named entity

entity recognition

bioner aims

biomedical data

gazetteer-based bioner

bioner task

task textual

Similar Publications

Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study.

JMIR Med Inform

August 2025

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada.

Hongbo Chen , Myrtede Alfred , Eldan Cohen

Background: The presence of stigmatizing language within electronic health records (EHRs) poses significant risks to patient care by perpetuating biases. While numerous studies have explored the use of supervised machine learning models to detect stigmatizing language automatically, these models require large, annotated datasets, which may not always be readily available. In-context learning (ICL) has emerged as a data-efficient alternative, allowing large language models to adapt to tasks using only instructions and examples.

View Article and Find Full Text PDF

Similar Publications

The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

PeerJ Comput Sci

June 2025

Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.

Mohammed Alsuhaibani

Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model.

View Article and Find Full Text PDF

Similar Publications

Hyperbolic vision language representation learning on chest radiology images.

Health Inf Sci Syst

December 2025

Department of Anesthesiology, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030 China.

Zuojing Zhang , Zhi Qiao , Linbin Han , Hong Yang , Zhen Qian

Given the visual-semantic hierarchy between images and texts, hyperbolic embeddings have been employed for visual-semantic representation learning, leveraging the advantages of hierarchy modeling in hyperbolic space. This approach demonstrates notable advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, textual data in the medical domain often comprises complex sentences describing various conditions or diseases, posing challenges for vision language models to comprehend free-text medical reports.

View Article and Find Full Text PDF

Similar Publications

Zero-Shot Relation Classification Through Inference on Category Attributes.

IEEE Trans Neural Netw Learn Syst

July 2025

Yan Xiao , Yaochu Jin , Bin Wang , Yan Zhang , Kuangrong Hao

Article Synopsis

The goal of relationship classification (RC) is to identify the semantic relationship between entities in sentences, but current approaches mostly rely on predefined relationships, making it hard to recognize new ones, a challenge known as zero-shot relationship classification (ZSRC).
Existing ZSRC methods struggle with autonomy and often require manual definitions, so researchers propose a new framework called inference on category attributes (ICA) to improve how models understand unseen relationships.
The ICA framework uses hypothesis templates based on relationship descriptions to convert RC data into a textual entailment format, enhancing a model's ability to generalize knowledge to new classes, and has shown strong performance on benchmark datasets like FewRel and Wiki-ZSL.

View Article and Find Full Text PDF

Similar Publications

Identifying the Question Similarity of Regulatory Documents in the Pharmaceutical Industry by Using the Recognizing Question Entailment System: Evaluation Study.

JMIR AI

September 2023

Eli Lilly and Company, Indianapolis, IN, United States.

Nidhi Saraswat , Chuqin Li , Min Jiang

Background: The regulatory affairs (RA) division in a pharmaceutical establishment is the point of contact between regulatory authorities and pharmaceutical companies. They are delegated the crucial and strenuous task of extracting and summarizing relevant information in the most meticulous manner from various search systems. An artificial intelligence (AI)-based intelligent search system that can significantly bring down the manual efforts in the existing processes of the RA department while maintaining and improving the quality of final outcomes is desirable.

View Article and Find Full Text PDF

Similar Publications