Explainable artificial intelligence in forensic DNA analysis: Alleles identification in challenging electropherograms using supervised machine learning methods.

Mengyu Tan , Yuxuan Tan , Haoyan Jiang , Jiaming Xue , Qiushuo Wu , Yazi Zheng , Guihong Liu , Yuanyuan Xiao , Meili Lv , Miao Liao , Lin Zhang , Shengqiu Qu , Weibo Liang

Forensic Sci Int Genet

Department of Forensic Genetics, West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu, China. Electronic address:

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Challenging samples in capillary electrophoresis (CE)-based short tandem repeat (STR) analysis often produce artefactual signals that cannot be completely filtered out by expert electropherogram (EPG) reading systems, complicating allele interpretation. Previous studies have demonstrated the potential of artificial intelligence (AI) to address this issue by accurately distinguishing allele signals from artefacts in EPGs. Traditional machine learning models offer significant advantages in enhancing the interpretability and transparency of AI models used in DNA analysis, particularly in criminal investigations and legal contexts. In this study, five traditional machine learning algorithms were employed to train and construct models using EPG signal datasets from single-source low-template EPGs, mixture EPGs, and combined datasets. Performance evaluation and validation with additional datasets demonstrated the feasibility of these models in improving the reportability of potential information in EPGs. However, further optimization is needed for mixture EPGs to enhance classification accuracy. Implementing Receiver Operating Characteristic (ROC) curve analysis and prediction probability thresholds effectively reduced false positive classifications. Additionally, a user-friendly platform was developed for EPG signal classification based on machine learning and ensemble learning, allowing for the classification of any signal datasets using traditional machine learning models and combining the prediction results of multiple models. This platform will provide analysts with more optimal and robust results. This study shows that machine-learning-based EPG signal classification models can significantly enhance the efficiency of sample analysis and interpretation, providing a solid foundation for future research.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.fsigen.2025.103289	DOI Listing

Publication Analysis

Top Keywords

machine learning

traditional machine

epg signal

artificial intelligence

dna analysis

learning models

signal datasets

mixture epgs

signal classification

models

Similar Publications

Letter to editor about "Utilizing explainable machine learning for progression-free survival prediction in high-grade serous ovarian cancer: insights from a prospective cohort study".

Int J Surg

September 2025

Shenzhen Traditional Chinese Medicine Hospital, The Fourth Clinical Medical College of Guangzhou University of Chinese Medicine, Shenzhen, People's Republic of China.

Mengying Bai , Wenbo Wu , Yuehui Zheng

View Article and Find Full Text PDF

Similar Publications

Unveiling molecular signatures for precision drug design: machine learning insights from trypanothione reductase, PKC-θ, and CB1.

Mol Divers

September 2025

Department of Biotechnology, National Institute of Technology Raipur, Raipur, Chhattisgarh, 492001, India.

Sunil Sahu , Adarsh Anmol , Tushar Nishad , Satya Eswari Jujjavarapu

Traditional drug discovery methods like high-throughput screening and molecular docking are slow and costly. This study introduces a machine learning framework to predict bioactivity (pIC₅₀) and identify key molecular properties and structural features for targeting Trypanothione reductase (TR), Protein kinase C theta (PKC-θ), and Cannabinoid receptor 1 (CB1) using data from the ChEMBL database. Molecular fingerprints, generated via PaDEL-Descriptor and RDKit, encoded structural features as binary vectors.

View Article and Find Full Text PDF

Similar Publications

Oral bioavailability property prediction based on task similarity transfer learning.

Mol Divers

September 2025

Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, 211198, China.

Chen Zeng , Chengcheng Xu , Yingxu Liu , Yunya Jiang , Lidan Zheng

Drug absorption significantly influences pharmacokinetics. Accurately predicting human oral bioavailability (HOB) is essential for optimizing drug candidates and improving clinical success rates. The traditional method based on experiment is a common way to obtain HOB, but the experimental method is time-consuming and costly.

View Article and Find Full Text PDF

Similar Publications

Decoding binocular color differences via EEG signals: linking ERP dynamics to chromatic disparity in CIELAB space.

Exp Brain Res

September 2025

School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China.

Famiao Mou , Zhineng Lv , Xuesong Jin , Jijun Pan , Lijun Yun

This study explores how differences in colors presented separately to each eye (binocular color differences) can be identified through EEG signals, a method of recording electrical activity from the brain. Four distinct levels of green-red color differences, defined in the CIELAB color space with constant luminance and chroma, are investigated in this study. Analysis of Event-Related Potentials (ERPs) revealed a significant decrease in the amplitude of the P300 component as binocular color differences increased, suggesting a measurable brain response to these differences.

View Article and Find Full Text PDF

Similar Publications

Using Medication Dispensation Data to Identify Clusters with Similar Prescribing Patterns in Older Adults Living with Dementia.

Drugs Aging

September 2025

Dalla Lana School of Public Health, University of Toronto, V1 06, 2075 Bayview Avenue, Toronto, ON, M4N 3M5, Canada.

Abby Emdin , Therese A Stukel , Jennifer Bethell , Xuesong Wang , Andrea Iaboni

Background And Objectives: Older adults living with dementia are a heterogeneous group, which can make studying optimal medication management challenging. Unsupervised machine learning is a group of computing methods that rely on unlabeled data-that is, where the algorithm itself is discovering patterns without the need for researchers to label the data with a known outcome. These methods may help us to better understand complex prescribing patterns in this population.

View Article and Find Full Text PDF

Similar Publications