Automated Methods Enable Direct Computation on Phenotypic Descriptions for Novel Candidate Gene Prediction.

Front Plant Sci

Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States.

Published: January 2020


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Natural language descriptions of plant phenotypes are a rich source of information for genetics and genomics research. We computationally translated descriptions of plant phenotypes into structured representations that can be analyzed to identify biologically meaningful associations. These representations include the entity-quality (EQ) formalism, which uses terms from biological ontologies to represent phenotypes in a standardized, semantically rich format, as well as numerical vector representations generated using natural language processing (NLP) methods (such as the bag-of-words approach and document embedding). We compared resulting phenotype similarity measures to those derived from manually curated data to determine the performance of each method. Computationally derived EQ and vector representations were comparably successful in recapitulating biological truth to representations created through manual EQ statement curation. Moreover, NLP methods for generating vector representations of phenotypes are scalable to large quantities of text because they require no human input. These results indicate that it is now possible to computationally and automatically produce and populate large-scale information resources that enable researchers to query phenotypic descriptions directly.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6965352PMC
http://dx.doi.org/10.3389/fpls.2019.01629DOI Listing

Publication Analysis

Top Keywords

vector representations
12
phenotypic descriptions
8
natural language
8
descriptions plant
8
plant phenotypes
8
nlp methods
8
representations
6
automated methods
4
methods enable
4
enable direct
4

Similar Publications

With the rapid development of genomic sequencing technologies, there is an increasing demand for efficient and accurate sequence analysis methods. However, existing methods face challenges in handling long, variable-length sequences and large-scale datasets. To address these issues, we propose a novel encoding method-Energy Entropy Vector (EEV).

View Article and Find Full Text PDF

This paper presents a novel multiscale signal processing framework for power quality disturbance (PQD) and cyber intrusion detection in smart grids, combining Non-Subsampled Contourlet Transform (NSCT), Split Augmented Lagrangian Shrinkage Algorithm (SALSA), and Morphological Component Analysis (MCA). A key innovation lies in an adaptive weighting mechanism within NSCT's directional sub bands, enabling dynamic energy redistribution and enhanced representation of both low-frequency anomalies (e.g.

View Article and Find Full Text PDF

Accurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF.

View Article and Find Full Text PDF

Dyadic linear programming and extensions.

Math Program

October 2024

Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Canada.

A rational number is if it has a finite binary representation , where is an integer and is a nonnegative integer. Dyadic rationals are important for numerical computations because they have an exact representation in floating-point arithmetic on a computer. A vector is if all its entries are dyadic rationals.

View Article and Find Full Text PDF

DiffRaman: A conditional latent denoising diffusion probabilistic model for enhancing bacterial identification via Raman spectra generation under limited data.

Anal Chim Acta

October 2025

State Key Laboratory of Precision Measurement Technology and Instruments, Tsinghua University, Beijing, 100084, China. Electronic address:

Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks.

View Article and Find Full Text PDF