Automated Methods Enable Direct Computation on Phenotypic Descriptions for Novel Candidate Gene Prediction.

Ian R Braun , Carolyn J Lawrence-Dill

Front Plant Sci

Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States.

Published: January 2020

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Natural language descriptions of plant phenotypes are a rich source of information for genetics and genomics research. We computationally translated descriptions of plant phenotypes into structured representations that can be analyzed to identify biologically meaningful associations. These representations include the entity-quality (EQ) formalism, which uses terms from biological ontologies to represent phenotypes in a standardized, semantically rich format, as well as numerical vector representations generated using natural language processing (NLP) methods (such as the bag-of-words approach and document embedding). We compared resulting phenotype similarity measures to those derived from manually curated data to determine the performance of each method. Computationally derived EQ and vector representations were comparably successful in recapitulating biological truth to representations created through manual EQ statement curation. Moreover, NLP methods for generating vector representations of phenotypes are scalable to large quantities of text because they require no human input. These results indicate that it is now possible to computationally and automatically produce and populate large-scale information resources that enable researchers to query phenotypic descriptions directly.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6965352	PMC
http://dx.doi.org/10.3389/fpls.2019.01629	DOI Listing

Publication Analysis

Top Keywords

vector representations

phenotypic descriptions

natural language

descriptions plant

plant phenotypes

nlp methods

representations

automated methods

methods enable

enable direct

Similar Publications

Energy entropy vector: a novel approach for efficient microbial genomic sequence analysis and classification.

Brief Bioinform

September 2025

Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing 101408, P. R. China.

Hao Wang , Guoqing Hu , Stephen S-T Yau

With the rapid development of genomic sequencing technologies, there is an increasing demand for efficient and accurate sequence analysis methods. However, existing methods face challenges in handling long, variable-length sequences and large-scale datasets. To address these issues, we propose a novel encoding method-Energy Entropy Vector (EEV).

View Article and Find Full Text PDF

Similar Publications

Multiscale detection of power quality disturbances and cyber intrusions in smart grids using NSCT and frequency band scalograms.

Sci Rep

September 2025

Fukushima Renewable Energy Institute, Koriyama, Japan.

Pampa Sinha , Kaushik Paul , Snehalika Snehalika , Idamkati Kasireddy , Ardhala Bala Krishna

This paper presents a novel multiscale signal processing framework for power quality disturbance (PQD) and cyber intrusion detection in smart grids, combining Non-Subsampled Contourlet Transform (NSCT), Split Augmented Lagrangian Shrinkage Algorithm (SALSA), and Morphological Component Analysis (MCA). A key innovation lies in an adaptive weighting mechanism within NSCT's directional sub bands, enabling dynamic energy redistribution and enhanced representation of both low-frequency anomalies (e.g.

View Article and Find Full Text PDF

Similar Publications

Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records.

Res Sq

August 2025

Ahmad F Al Musawi , Pratip Rana , Sibtanu Raha , William C Sleeman Iv , Rishabh Kapoor

Accurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF.

View Article and Find Full Text PDF

Similar Publications

Dyadic linear programming and extensions.

Math Program

October 2024

Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Canada.

Ahmad Abdi , Gérard Cornuéjols , Bertrand Guenin , Levent Tunçel

A rational number is if it has a finite binary representation , where is an integer and is a nonnegative integer. Dyadic rationals are important for numerical computations because they have an exact representation in floating-point arithmetic on a computer. A vector is if all its entries are dyadic rationals.

View Article and Find Full Text PDF

Similar Publications

DiffRaman: A conditional latent denoising diffusion probabilistic model for enhancing bacterial identification via Raman spectra generation under limited data.

Anal Chim Acta

October 2025

State Key Laboratory of Precision Measurement Technology and Instruments, Tsinghua University, Beijing, 100084, China. Electronic address:

Haiming Yao , Wei Luo , Ang Gao , Tao Zhou , Xue Wang

Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks.

View Article and Find Full Text PDF

Similar Publications