NGS read classification using AI.

Benjamin Voigt , Oliver Fischer , Christian Krumnow , Christian Herta , Piotr Wojciech Dabrowski

PLoS One

Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany.

Published: January 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8694450	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261548	PLOS

Publication Analysis

Top Keywords

comparison reference

novel pathogens

neural network

sequences

ngs read

read classification

classification clinical

clinical metagenomics

metagenomics powerful

powerful diagnostic

A PHP Error was encountered