98%
921
2 minutes
20
Natural language descriptions of plant phenotypes are a rich source of information for genetics and genomics research. We computationally translated descriptions of plant phenotypes into structured representations that can be analyzed to identify biologically meaningful associations. These representations include the entity-quality (EQ) formalism, which uses terms from biological ontologies to represent phenotypes in a standardized, semantically rich format, as well as numerical vector representations generated using natural language processing (NLP) methods (such as the bag-of-words approach and document embedding). We compared resulting phenotype similarity measures to those derived from manually curated data to determine the performance of each method. Computationally derived EQ and vector representations were comparably successful in recapitulating biological truth to representations created through manual EQ statement curation. Moreover, NLP methods for generating vector representations of phenotypes are scalable to large quantities of text because they require no human input. These results indicate that it is now possible to computationally and automatically produce and populate large-scale information resources that enable researchers to query phenotypic descriptions directly.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6965352 | PMC |
http://dx.doi.org/10.3389/fpls.2019.01629 | DOI Listing |
Brief Bioinform
September 2025
Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing 101408, P. R. China.
With the rapid development of genomic sequencing technologies, there is an increasing demand for efficient and accurate sequence analysis methods. However, existing methods face challenges in handling long, variable-length sequences and large-scale datasets. To address these issues, we propose a novel encoding method-Energy Entropy Vector (EEV).
View Article and Find Full Text PDFThis paper presents a novel multiscale signal processing framework for power quality disturbance (PQD) and cyber intrusion detection in smart grids, combining Non-Subsampled Contourlet Transform (NSCT), Split Augmented Lagrangian Shrinkage Algorithm (SALSA), and Morphological Component Analysis (MCA). A key innovation lies in an adaptive weighting mechanism within NSCT's directional sub bands, enabling dynamic energy redistribution and enhanced representation of both low-frequency anomalies (e.g.
View Article and Find Full Text PDFAccurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF.
View Article and Find Full Text PDFMath Program
October 2024
Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Canada.
A rational number is if it has a finite binary representation , where is an integer and is a nonnegative integer. Dyadic rationals are important for numerical computations because they have an exact representation in floating-point arithmetic on a computer. A vector is if all its entries are dyadic rationals.
View Article and Find Full Text PDFAnal Chim Acta
October 2025
State Key Laboratory of Precision Measurement Technology and Instruments, Tsinghua University, Beijing, 100084, China. Electronic address:
Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks.
View Article and Find Full Text PDF