Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses.

Prev Vet Med

Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada. Electronic address: zpoljak@uoguelph

Published: December 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.prevetmed.2024.106351DOI Listing

Publication Analysis

Top Keywords

machine learning
12
predicted probability
12
sequences
9
public health
8
influenza virus
8
validation dataset
8
sequences interest
8
dataset composed
8
h3n2 sequences
8
environmental sequences
8

Similar Publications

Aims And Objective: The field of medical statistics has experienced significant advancements driven by integrating innovative statistical methodologies. This study aims to conduct a comprehensive analysis to explore current trends, influential research areas, and future directions in medical statistics.

Methods: This paper maps the evolution of statistical methods used in medical research based on 4,919 relevant publications retrieved from the Web of Science.

View Article and Find Full Text PDF

Background: Cerebrovascular reactivity reflects changes in cerebral blood flow in response to an acute stimulus and is reflective of the brain's ability to match blood flow to demand. Functional MRI with a breath-hold task can be used to elicit this vasoactive response, but data validity hinges on subject compliance. Determining breath-hold compliance often requires external monitoring equipment.

View Article and Find Full Text PDF

Objectives: Non-small cell lung cancer (NSCLC) is associated with poor prognosis, with 30% of patients diagnosed at an advanced stage. Mutations in the and genes are important prognostic factors for NSCLC, and targeted therapies can significantly improve survival in these patients. Although tissue biopsy remains the gold standard for detecting gene mutations, it has limitations, including invasiveness, sampling errors due to tumor heterogeneity, and poor reproducibility.

View Article and Find Full Text PDF

Artificial Intelligence in Contact Dermatitis: Current and Future Perspectives.

Dermatitis

September 2025

From the Department of Dermatology, Venereology and Leprology, All India Institute of Medical Sciences (AIIMS), Bhopal, India.

Contact dermatitis (CD), which includes both allergic CD and irritant CD, is a common inflammatory condition that can pose significant diagnostic challenges. Although patch testing is the gold standard for identifying causative allergens for allergic contact dermatitis (ACD), it is time-consuming, subjective, and requires expert interpretation. Recent advancements in artificial intelligence (AI), particularly in machine learning (ML) and deep learning, have shown promise in improving the accuracy, efficiency, and accessibility of CD diagnosis and management.

View Article and Find Full Text PDF