Integrating high dimensional bi-directional parsing models for gene mention tagging.

Chun-Nan Hsu , Yu-Ming Chang , Cheng-Ju Kuo , Yu-Shi Lin , Han-Shen Huang , I-Fang Chung

Bioinformatics

Institute of Information Science, Academia Sinica, Taipei, Taiwan.

Published: July 2008

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: Tagging gene and gene product mentions in scientific text is an important initial step of literature mining. In this article, we describe in detail our gene mention tagger participated in BioCreative 2 challenge and analyze what contributes to its good performance. Our tagger is based on the conditional random fields model (CRF), the most prevailing method for the gene mention tagging task in BioCreative 2. Our tagger is interesting because it accomplished the highest F-scores among CRF-based methods and second over all. Moreover, we obtained our results by mostly applying open source packages, making it easy to duplicate our results.

Results: We first describe in detail how we developed our CRF-based tagger. We designed a very high dimensional feature set that includes most of information that may be relevant. We trained bi-directional CRF models with the same set of features, one applies forward parsing and the other backward, and integrated two models based on the output scores and dictionary filtering. One of the most prominent factors that contributes to the good performance of our tagger is the integration of an additional backward parsing model. However, from the definition of CRF, it appears that a CRF model is symmetric and bi-directional parsing models will produce the same results. We show that due to different feature settings, a CRF model can be asymmetric and the feature setting for our tagger in BioCreative 2 not only produces different results but also gives backward parsing models slight but constant advantage over forward parsing model. To fully explore the potential of integrating bi-directional parsing models, we applied different asymmetric feature settings to generate many bi-directional parsing models and integrate them based on the output scores. Experimental results show that this integrated model can achieve even higher F-score solely based on the training corpus for gene mention tagging.

Availability: Data sets, programs and an on-line service of our gene mention tagger can be accessed at http://aiia.iis.sinica.edu.tw/biocreative2.htm.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718659	PMC
http://dx.doi.org/10.1093/bioinformatics/btn183	DOI Listing

Publication Analysis

Top Keywords

parsing models

gene mention

bi-directional parsing

high dimensional

parsing

mention tagging

describe detail

mention tagger

contributes good

good performance

Similar Publications

Leveraging Multi-Text Joint Prompts in SAM for Robust Medical Image Segmentation.

IEEE J Biomed Health Inform

September 2025

Xu Zhang , Huangxuan Zhao , Lefei Zhang , Yuan Xiong

The Segment Anything Model (SAM) has attracted considerable attention due to its impressive performance and demonstrates potential in medical image segmentation. Compared to SAM's native point and bounding box prompts, text prompts offer a simpler and more efficient alternative in the medical field, yet this approach remains relatively underexplored. In this paper, we propose a SAM-based framework that integrates a pre-trained vision-language model to generate referring prompts, with SAM handling the segmentation task.

View Article and Find Full Text PDF

Similar Publications

Deriving a Theory for Nursing on Uncertainty in Cardiac Inherited Disease: Integrating Self-Organized Criticality and Parse's Human Becoming Theory.

ANS Adv Nurs Sci

September 2025

Author Affiliations: University of Rochester School of Nursing, Rochester, USA; University of Rochester Medical Center, Rochester, USA.

Sudhir K Mummidi

Existential uncertainty dominates the patient experience in cardiac inherited diseases due to unpredictable disease course and the risk of sudden cardiac death. Traditional uncertainty theoretical models inadequately capture the unique facets of uncertainty associated with these diseases. Using Walker and Avant's theory derivation method, an uncertainty theory for nursing practice was developed by adapting concepts from self-organized criticality and grounding them with Parse's Human Becoming Theory within the Unitary Transformative Paradigm.

View Article and Find Full Text PDF

Similar Publications

Scaling Sensor Metadata Extraction for Exposure Health Using LLMs.

medRxiv

August 2025

Fatemeh Shah-Mohammadi , Sunho Im , Julio C Facelli , Mollie R Cummins , Ram Gouripeddi

Background: The rapid evolution and diversity of sensor technologies, coupled with inconsistencies in how sensor metadata is reported across formats and sources, present significant challenges for generating exposomes and exposure health research.

Objective: Despite the development of standardized metadata schemas, the process of extracting sensor metadata from unstructured sources remains largely manual and unscalable. To address this bottleneck, we developed and evaluated a large language model (LLM)-based pipeline for automating sensor metadata extraction and harmonization from exposure health literature publicly available.

View Article and Find Full Text PDF

Similar Publications

Clinical effectiveness of a cloud-based dual-layer prescription review system: provincial integration across internet and outpatient care.

Int J Med Inform

August 2025

National Engineering Laboratory for Internet Medical Systems and Applications, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China. Electronic address:

Jinming Shi , Dongxu Sun , Jian Kang , Wenjie Hao , Guoshu Huang

Purpose: Ensuring medication safety remains a pressing challenge in fragmented healthcare systems, particularly with the rapid growth of Internet Hospitals and limited pharmacist resources. Existing prescription review tools are often siloed and lack cross-institutional scalability. This study presents a cloud-based, dual-layer prescription review system (CEPR) designed to support provincial integration across Internet and outpatient care.

View Article and Find Full Text PDF

Similar Publications

Systematic Evaluation of Manufacturer Disclosure Statements for Medical Device Security (MDS2) to Strengthen Hospital OT Security Measures - Lessons Learned.

Stud Health Technol Inform

September 2025

Institute for Medical Informatics and Biometry, Dresden University of Technology, Dresden, Germany.

Stefan Stein , Michael Pilgermann , Martin Sedlmayr

Introduction: The growing number of connected medical devices in hospitals poses serious operational technology (OT) security challenges. Effective countermeasures require a structured analysis of the communication interfaces and security configurations of individual devices.

State Of The Art: Although Manufacturer Disclosure Statements for Medical Device Security (MDS2, Version 2019) offer relevant information, they are rarely integrated into cybersecurity workflows.

View Article and Find Full Text PDF

Similar Publications