98%
921
2 minutes
20
Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features. These high-quality features reduce the previously critical role of graph structure, resulting in a modest performance gap between Graph Neural Networks (GNNs) and structure-agnostic Multi-Layer Perceptrons (MLPs). Motivated by this, we introduce a feature-centric pretraining perspective by treating graph structure as a prior and leveraging the rich, unified feature space to learn refined interaction patterns that generalizes across graphs. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walk and employs masked feature reconstruction to capture pairwise proximity in the LLM-unified feature space using a standard Transformer. By utilizing unified text representations rather than varying structures, GSPT alleviates structural heterogeneity and achieves significantly better transferability among graphs within the same domain. Our approach can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets. The source code is publicly available at https://github.com/SongYYYY/GSPT.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416796 | PMC |
Proc Mach Learn Res
November 2024
Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.
View Article and Find Full Text PDFJ Affect Disord
September 2025
College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China. Electronic address:
Major Depressive Disorder (MDD) poses a significant global health threat, impairing individual functioning and increasing socioeconomic burden. Accurate diagnosis is crucial for improving treatment outcomes. This study proposes Time-Frequency Text-Attributed DeepWalk (TF-TADW), a framework for MDD classification using resting-state functional MRI data.
View Article and Find Full Text PDFArtif Intell Med
October 2025
Public Health Wales, United Kingdom.
As a systemic problem, public health cannot be addressed without considering other policy dimensions. Hence, a holistic approach across public policy areas is necessary to incorporate Health-for-All values into decision-making. However, such multisectoral interventions require public budgets that are effectively mapped into public health outcomes and indicators of their wider determinants.
View Article and Find Full Text PDFEntropy (Basel)
March 2025
Electronic Information School, Wuhan University, Wuhan 430072, China.
There are complex graph structures and rich textual information on social networks. Text provides important information for various tasks, while graph structures offer multilevel context for the semantics of the text. Contemporary researchers tend to represent these kinds of data by text-attributed graphs (TAGs).
View Article and Find Full Text PDFbioRxiv
December 2024
Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM).
View Article and Find Full Text PDF