MKG-GC: A multi-task learning-based knowledge graph construction framework with personalized application to gastric cancer.

Yang Yang , Yuwei Lu , Zixuan Zheng , Hao Wu , Yuxin Lin , Fuliang Qian , Wenying Yan

Comput Struct Biotechnol J

Department of Bioinformatics, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Over the past decade, information for precision disease medicine has accumulated in the form of textual data. To effectively utilize this expanding medical text, we proposed a multi-task learning-based framework based on hard parameter sharing for knowledge graph construction (MKG), and then used it to automatically extract gastric cancer (GC)-related biomedical knowledge from the literature and identify GC drug candidates. In MKG, we designed three separate modules, MT-BGIPN, MT-SGTF and MT-ScBERT, for entity recognition, entity normalization, and relation classification, respectively. To address the challenges posed by the long and irregular naming of medical entities, the MT-BGIPN utilized bidirectional gated recurrent unit and interactive pointer network techniques, significantly improving entity recognition accuracy to an average F1 value of 84.5% across datasets. In MT-SGTF, we employed the term frequency-inverse document frequency and the gated attention unit. These combine both semantic and characteristic features of entities, resulting in an average Hits@ 1 score of 94.5% across five datasets. The MT-ScBERT integrated cross-text, entity, and context features, yielding an average F1 value of 86.9% across 11 relation classification datasets. Based on the MKG, we then developed a specific knowledge graph for GC (MKG-GC), which encompasses a total of 9129 entities and 88,482 triplets. Lastly, the MKG-GC was used to predict potential GC drugs using a pre-trained language model called BioKGE-BERT and a drug-disease discriminant model based on CNN-BiLSTM. Remarkably, nine out of the top ten predicted drugs have been previously reported as effective for gastric cancer treatment. Finally, an online platform was created for exploration and visualization of MKG-GC at https://www.yanglab-mi.org.cn/MKG-GC/.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10995799	PMC
http://dx.doi.org/10.1016/j.csbj.2024.03.021	DOI Listing

Publication Analysis

Top Keywords

knowledge graph

gastric cancer

multi-task learning-based

graph construction

entity recognition

relation classification

mkg-gc

mkg-gc multi-task

knowledge

learning-based knowledge

Similar Publications

A Pure Transformer Pretraining Framework on Text-attributed Graphs.

Proc Mach Learn Res

November 2024

Michigan State University.

Yu Song , Haitao Mao , Jiachen Xiao , Jingzhe Liu , Zhikai Chen

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.

View Article and Find Full Text PDF

Similar Publications

Enhancing Knowledge Retention by Simulation-Based Learning Among First-Year Medical Students.

Cureus

August 2025

Physiology, SGT University, Gurugram, IND.

Nimarpreet Kaur , Bhupendra Yadav , Deepti Dwivedi , Harminder Kaur , Pragyashaa Chaudhary

Introduction Simulation-based training has been a vital part of medical education since Competency-Based Medical Education (CBME) was introduced, and new guidelines since 2023 have expanded to include simulation as a mandatory methodology of teaching. This method enables learners to build and develop both technical and non-technical abilities in a safe and controlled setting, enhancing their preparedness for real-life medical scenarios. Simulation-based training improves skill acquisition and retention and enhances learners' confidence, reduces anxiety, reinforces learning, corrects errors, and promotes reflective practice, in contrast with the traditional method of teaching.

View Article and Find Full Text PDF

Similar Publications

KG-MACNF: A nonlinear cross-modal fusion model for predicting drug-target interactions via multi-relational embedding and fine-grained structure.

PLoS One

September 2025

School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao, Shandong, China.

Yihan Feng , Xixin Yang , Yuanlin Guan , Jinyao Zhang , Hang Yang

Drug-target interaction (DTI) prediction is essential for the development of novel drugs and the repurposing of existing ones. However, when the features of drug and target are applied to biological networks, there is a lack of capturing the relational features of drug-target interactions. And the corresponding multimodal models mainly depend on shallow fusion strategies, which results in suboptimal performance when trying to capture complex interaction relationships.

View Article and Find Full Text PDF

Similar Publications

A Scorecard for Information Synthesis in Multiple Experimental Conditions: Application to Bacterial Biofilm Matrix Transcriptomics.

Curr Microbiol

September 2025

Department of Health Sciences, Università del Piemonte Orientale UPO, Corso Trieste 15/A, 28100, Novara, Italy.

Mauro Nascimben , Lia Rimondini

A Python-scripted software tool has been developed to help study the heterogeneity of gene changes, markedly or moderately expressed, when several experimental conditions are compared. The analysis workflow encloses a scorecard that groups genes based on relative fold-change and statistical significance, providing additional functions that facilitate knowledge extraction. The scorecard reports highlight unique patterns of gene regulation, such as genes whose expression is consistently up- or down-regulated across experiments, all of which are supported by graphs and summaries to characterize the dataset under investigation.

View Article and Find Full Text PDF

Similar Publications

FmH2ST: foundation model-based spatial transcriptomics generation from histological images.

Nucleic Acids Res

September 2025

School of Software, Shandong University, Jinan 250101, Shandong, China.

Yuequn Wang , Jun Wang , Yanyu Xu , Ning Liu , Bin Liu

Spatial transcriptomics (ST) reveals gene expression distributions within tissues. Yet, predicting spatial gene expression from histological images still faces the challenges of limited ST data that lack prior knowledge, and insufficient capturing of inter-slice heterogeneity and intra-slice complexity. To tackle these challenges, we introduce FmH2ST, a foundation model-based method for spatial gene expression prediction.

View Article and Find Full Text PDF

Similar Publications