98%
921
2 minutes
20
Graph neural networks (GNNs) have become a popular approach for semi-supervised graph representation learning. GNNs research has generally focused on improving methodological details, whereas less attention has been paid to exploring the importance of labeling the data. However, for semi-supervised learning, the quality of training data is vital. In this paper, we first introduce and elaborate on the problem of training data selection for GNNs. More specifically, focusing on node classification, we aim to select representative nodes from a graph used to train GNNs to achieve the best performance. To solve this problem, we are inspired by the popular lottery ticket hypothesis, typically used for sparse architectures, and we propose the following subset hypothesis for graph data: "There exists a core subset when selecting a fixed-size dataset from the dense training dataset, that can represent the properties of the dataset, and GNNs trained on this core subset can achieve a better graph representation". Equipped with this subset hypothesis, we present an efficient algorithm to identify the core data in the graph for GNNs. Extensive experiments demonstrate that the selected data (as a training set) can obtain performance improvements across various datasets and GNNs architectures.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2024.106635 | DOI Listing |
J Chem Inf Model
September 2025
Songshan Lake Materials Laboratory, Dongguan 523808, PR China.
Large language models (LLMs) have demonstrated transformative potential for materials discovery in condensed matter systems, but their full utility requires both broader application scenarios and integration with ab initio crystal structure prediction (CSP), density functional theory (DFT) methods and domain knowledge to benefit future inverse material design. Here, we develop an integrated computational framework combining language model-guided materials screening with genetic algorithm (GA) and graph neural network (GNN)-based CSP methods to predict new photovoltaic material. This LLM + CSP + DFT approach successfully identifies a previously overlooked oxide material with unexpected photovoltaic potential.
View Article and Find Full Text PDFJ Chem Theory Comput
September 2025
Department of Materials Science and Engineering, City University of Hong Kong, Kowloon 999077, Hong Kong China.
Coarse-grained (CG) lipid models enable efficient simulations of large-scale membrane events. However, achieving both speed and atomic-level accuracy remains challenging. Graph neural networks (GNNs) trained on all-atom (AA) simulations can serve as CG force fields, which have demonstrated success in CG simulations of proteins.
View Article and Find Full Text PDFFront Hum Neurosci
August 2025
Baptist Medical Center, Department of Behavioral Health, Jacksonville, FL, United States.
Introduction: This study investigates four subdomains of executive functioning-initiation, cognitive inhibition, mental shifting, and working memory-using task-based functional magnetic resonance imaging (fMRI) data and graph analysis.
Methods: We used healthy adults' functional magnetic resonance imaging (fMRI) data to construct brain connectomes and network graphs for each task and analyzed global and node-level graph metrics.
Results: The bilateral precuneus and right medial prefrontal cortex emerged as pivotal hubs and influencers, emphasizing their crucial regulatory role in all four subdomains of executive function.
Proc Mach Learn Res
November 2024
Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.
View Article and Find Full Text PDFNat Biomed Eng
September 2025
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Phenotype-driven approaches identify disease-counteracting compounds by analysing the phenotypic signatures that distinguish diseased from healthy states. Here we introduce PDGrapher, a causally inspired graph neural network model that predicts combinatorial perturbagens (sets of therapeutic targets) capable of reversing disease phenotypes. Unlike methods that learn how perturbations alter phenotypes, PDGrapher solves the inverse problem and predicts the perturbagens needed to achieve a desired response by embedding disease cell states into networks, learning a latent representation of these states, and identifying optimal combinatorial perturbations.
View Article and Find Full Text PDF