Finding core labels for maximizing generalization of graph neural networks.

Sichao Fu , Xueqi Ma , Yibing Zhan , Fanyu You , Qinmu Peng , Tongliang Liu , James Bailey , Danilo Mandic

Neural Netw

Department of Electrical Engineering, Imperial College London, London SW7 2BX, UK. Electronic address:

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Graph neural networks (GNNs) have become a popular approach for semi-supervised graph representation learning. GNNs research has generally focused on improving methodological details, whereas less attention has been paid to exploring the importance of labeling the data. However, for semi-supervised learning, the quality of training data is vital. In this paper, we first introduce and elaborate on the problem of training data selection for GNNs. More specifically, focusing on node classification, we aim to select representative nodes from a graph used to train GNNs to achieve the best performance. To solve this problem, we are inspired by the popular lottery ticket hypothesis, typically used for sparse architectures, and we propose the following subset hypothesis for graph data: "There exists a core subset when selecting a fixed-size dataset from the dense training dataset, that can represent the properties of the dataset, and GNNs trained on this core subset can achieve a better graph representation". Equipped with this subset hypothesis, we present an efficient algorithm to identify the core data in the graph for GNNs. Extensive experiments demonstrate that the selected data (as a training set) can obtain performance improvements across various datasets and GNNs architectures.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2024.106635	DOI Listing

Publication Analysis

Top Keywords

graph neural

neural networks

training data

subset hypothesis

core subset

graph

gnns

data

finding core

core labels

Similar Publications

Leveraging Language Model, Crystal Structure Prediction and First-Principles Calculation for Material Design.

J Chem Inf Model

September 2025

Songshan Lake Materials Laboratory, Dongguan 523808, PR China.

Lei Zhang , Ben Ni , Kaiyang Xu , Yiru Huang , Qingfang Li

Large language models (LLMs) have demonstrated transformative potential for materials discovery in condensed matter systems, but their full utility requires both broader application scenarios and integration with ab initio crystal structure prediction (CSP), density functional theory (DFT) methods and domain knowledge to benefit future inverse material design. Here, we develop an integrated computational framework combining language model-guided materials screening with genetic algorithm (GA) and graph neural network (GNN)-based CSP methods to predict new photovoltaic material. This LLM + CSP + DFT approach successfully identifies a previously overlooked oxide material with unexpected photovoltaic potential.

View Article and Find Full Text PDF

Similar Publications

Development of Coarse-Grained Lipid Force Fields Based on a Graph Neural Network.

J Chem Theory Comput

September 2025

Department of Materials Science and Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong China.

Zhenyu Liao , Ting Si , Tairan Wang , Ji-Jung Kai , Christophe Chipot

Coarse-grained (CG) lipid models enable efficient simulations of large-scale membrane events. However, achieving both speed and atomic-level accuracy remains challenging. Graph neural networks (GNNs) trained on all-atom (AA) simulations can serve as CG force fields, which have demonstrated success in CG simulations of proteins.

View Article and Find Full Text PDF

Similar Publications

Hubs, influencers, and communities of executive functions: a task-based fMRI graph analysis.

Front Hum Neurosci

August 2025

Baptist Medical Center, Department of Behavioral Health, Jacksonville, FL, United States.

Alexandra T Davis

Introduction: This study investigates four subdomains of executive functioning-initiation, cognitive inhibition, mental shifting, and working memory-using task-based functional magnetic resonance imaging (fMRI) data and graph analysis.

Methods: We used healthy adults' functional magnetic resonance imaging (fMRI) data to construct brain connectomes and network graphs for each task and analyzed global and node-level graph metrics.

Results: The bilateral precuneus and right medial prefrontal cortex emerged as pivotal hubs and influencers, emphasizing their crucial regulatory role in all four subdomains of executive function.

View Article and Find Full Text PDF

Similar Publications

A Pure Transformer Pretraining Framework on Text-attributed Graphs.

Proc Mach Learn Res

November 2024

Michigan State University.

Yu Song , Haitao Mao , Jiachen Xiao , Jingzhe Liu , Zhikai Chen

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.

View Article and Find Full Text PDF

Similar Publications

Combinatorial prediction of therapeutic perturbations using causally inspired neural networks.

Nat Biomed Eng

September 2025

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Guadalupe Gonzalez , Xiang Lin , Isuru Herath , Kirill Veselkov , Michael Bronstein

Phenotype-driven approaches identify disease-counteracting compounds by analysing the phenotypic signatures that distinguish diseased from healthy states. Here we introduce PDGrapher, a causally inspired graph neural network model that predicts combinatorial perturbagens (sets of therapeutic targets) capable of reversing disease phenotypes. Unlike methods that learn how perturbations alter phenotypes, PDGrapher solves the inverse problem and predicts the perturbagens needed to achieve a desired response by embedding disease cell states into networks, learning a latent representation of these states, and identifying optimal combinatorial perturbations.

View Article and Find Full Text PDF

Similar Publications