Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191354PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0285433PLOS

Publication Analysis

Top Keywords

library command-line
12
phenopacket schema
8
clinical data
8
java library
8
command-line application
8
validate phenopackets
8
phenopackets
6
phenopacket-tools
5
library
5
phenopacket-tools building
4

Similar Publications

We introduce Taskblaster, a generic and lightweight Python framework for composing, executing, and managing computational workflows with automated error handling. Taskblaster supports dynamic workflows including flow control using branches and iteration, making the system Turing complete. Taskblaster aims to promote modular designs, where workflows are composed of reusable sub-workflows, and to simplify data maintenance as projects evolve and change.

View Article and Find Full Text PDF

Motivation: Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to represent data and improve queries across heterogeneous datasets. However, constructing KGs from unstructured literature remains challenging due to the cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process, but have limitations that impact their ability to capture complex relationships fully.

View Article and Find Full Text PDF

Standardized analysis pipelines contribute to making data bioinformatics research compliant with the paradigm of Findability, Accessibility, Interoperability, and Reusability (FAIR), and facilitate collaboration. Nextflow and Snakemake, two popular command-line solutions, are increasingly adopted by users, complementing GUI-based platforms such as Galaxy. We report recent developments of the nf-core framework with the new Nextflow Domain-Specific Language (DSL2).

View Article and Find Full Text PDF

Motivation: Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms.

View Article and Find Full Text PDF

NeuralTSNE: A Python Package for the Dimensionality Reduction of Molecular Dynamics Data Using Neural Networks.

J Chem Inf Model

July 2025

Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland.

Unsupervised machine learning has recently gained much attention in the field of molecular dynamics (MD). Particularly, dimensionality reduction techniques have been regularly employed to analyze large volumes of high-dimensional MD data to gain insight into hidden information encoded in MD trajectories. Among many such techniques, t-distributed stochastic neighbor embedding (t-SNE) is especially popular.

View Article and Find Full Text PDF