Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes. For this reason, classification procedures are forced to assume that the vast majority of unlabeled edges are negative. Existing approaches to sampling negative edges for training and evaluating classifiers do so by uniformly sampling pairs of nodes.

Results: We show here that this sampling strategy typically leads to sets of positive and negative examples with imbalanced node degree distributions. Using representative heterogeneous biomedical knowledge graph and random walk-based graph machine learning, we show that this strategy substantially impacts classification performance. If users of graph machine-learning models apply the models to prioritize examples that are drawn from approximately the same distribution as the positive examples are, then performance of models as estimated in the validation phase may be artificially inflated. We present a degree-aware node sampling approach that mitigates this effect and is simple to implement.

Availability And Implementation: Our code and data are publicly available at https://github.com/monarch-initiative/negativeExampleSelection.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10994718PMC
http://dx.doi.org/10.1093/bioadv/vbae036DOI Listing

Publication Analysis

Top Keywords

classification performance
8
random walk-based
8
walk-based graph
8
graph representation
8
representation learning
8
edges represent
8
negative edges
8
graph
7
sampling
5
node-degree aware
4

Similar Publications

Use of artificial intelligence for classification of fractures around the elbow in adults according to the 2018 AO/OTA classification system.

BMC Musculoskelet Disord

September 2025

Department of Clinical Sciences at Danderyds Hospital, Department of Orthopedic Surgery, Karolinska Institutet, Stockholm, 182 88, Sweden.

Background: This study evaluates the accuracy of an Artificial Intelligence (AI) system, specifically a convolutional neural network (CNN), in classifying elbow fractures using the detailed 2018 AO/OTA fracture classification system.

Methods: A retrospective analysis of 5,367 radiograph exams visualizing the elbow from adult patients (2002-2016) was conducted using a deep neural network. Radiographs were manually categorized according to the 2018 AO/OTA system by orthopedic surgeons.

View Article and Find Full Text PDF

Sustainable urban development requires actionable insights into the thermal consequences of land transformation. This study examines the impact of land use and land cover (LULC) changes on land surface temperature (LST) in Ho Chi Minh city, Vietnam, between 1998 and 2024. Using Google Earth Engine (GEE), three machine learning algorithms-random forest (RF), support vector machine (SVM), and classification and regression tree (CART)-were applied for LULC classification.

View Article and Find Full Text PDF

Measurable neuromotor control deficits during functional task performance could provide objective criteria to aid in concussion diagnosis. However, many tools which measure these constructs are unidimensional and not clinically feasible. The purpose of this study was to assess the classification accuracy of a machine learning model using features measured by a clinically feasible movement-based assessment system (Mizzou Point-of-care Assessment System (MPASS) between athletes with and without concussion.

View Article and Find Full Text PDF

Analyzing Reddit Social Media Content in the United States Related to H5N1: Sentiment and Topic Modeling Study.

J Med Internet Res

September 2025

Artificial Intelligence and Mathematical Modeling Lab, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Background: The H5N1 avian influenza A virus represents a serious threat to both animal and human health, with the potential to escalate into a global pandemic. Effective monitoring of social media during H5N1 avian influenza outbreaks could potentially offer critical insights to guide public health strategies. Social media platforms like Reddit, with their diverse and region-specific communities, provide a rich source of data that can reveal collective attitudes, concerns, and behavioral trends in real time.

View Article and Find Full Text PDF

Machine learning based classification of imagined speech electroencephalogram data from the amplitude and phase spectrum of frequency domain EEG signal.

Biomed Phys Eng Express

September 2025

electrical engineering department, Indian Institute of Technology Roorkee, Research wing, electrical department, Roorkee, uttrakhand, 247664, INDIA.

Imagined speech classification involves decoding brain signals to recognize verbalized thoughts or intentions without actual speech production. This technology has significant implications for individuals with speech impairments, offering a means to communicate through neural signals. The prime objective of this work is to propose an innovative machine learning (ML) based classification methodology that combines electroencephalogram (EEG) data augmentation using a sliding window technique with statistical feature extraction from the amplitude and phase spectrum of frequency domain EEG segments.

View Article and Find Full Text PDF