The prevention of chronic disease is a long-term combat with continual fine-tuning to adapt to the course of disease. Without comprehensive insights, prescriptions may prioritize short-term gains but deviate from trajectories toward long-term survival. Here we introduce Duramax, an evidence-based framework empowered by reinforcement learning to optimize long-term preventive strategies.
View Article and Find Full Text PDFBackground: Cardiovascular disease (CVD) is the leading cause of mortality and morbidity in China and worldwide while we are lacking in validated primary prevention model specifically for Chinese. To identify CVD high-risk individuals for early intervention, we created and validated a primary prevention risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (1°P-CARDIAC), in contemporary Chinese cohorts in Hong Kong.
Methods: Patients without any history of CVD was categorized as derivation and validation cohorts based on their different geographical location of residence in Hong Kong.
Motivation: Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants-particularly collecting literature-based evidence like ACMG/AMP PM3-remains complex and time-consuming.
Results: We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs).
Long-read sequencing technologies have great potential for the comprehensive discovery of structural variations (SVs). However, accurate genotype assignment for SVs remains challenging due to unavoidable sequencing errors, limited coverage, and the complexity of SVs. Herein, we propose cuteFC, which employs self-adaptive clustering along with a multiallele-aware clustering to achieve accurate SV regenotyping through a force-calling approach.
View Article and Find Full Text PDFObjective: Cholelithiasis and gastroesophageal reflux disease (GERD) contribute to significant health concerns. We aimed to investigate the potential observational, causal, and genetic relationships between cholelithiasis and GERD.
Design: The observational correlations were assessed based on the prospective cohort study from UK Biobank.
Nat Struct Mol Biol
July 2025
Differential high-order chromatin interactions between homologous chromosomes affect many biological processes. Traditional chromatin conformation capture genome analysis methods mainly identify two-way interactions and cannot provide comprehensive haplotype information, especially for low-heterozygosity organisms such as human. Here, we present a pipeline of methods to delineate diploid high-order chromatin interactions from noisy Pore-C outputs.
View Article and Find Full Text PDFVariant calling using long-read RNA sequencing (lrRNA-seq) can be applied to diverse tasks, such as capturing full-length isoforms and gene expression profiling. It poses challenges, however, due to higher error rates than DNA data, the complexities of transcript diversity, RNA editing events, etc. In this paper, we propose Clair3-RNA, the first deep learning-based variant caller tailored for lrRNA-seq data.
View Article and Find Full Text PDFBrief Bioinform
November 2024
A vast amount of single-cell RNA sequencing (SC) data have been accumulated via various studies and consortiums, but the lack of spatial information limits its analysis of complex biological activities. To bridge this gap, we introduce CellContrast, a computational method for reconstructing spatial relationships among SC cells from spatial transcriptomics (ST) reference. By adopting a contrastive learning framework and training with ST data, CellContrast projects gene expressions into a hidden space where proximate cells share similar representation values.
View Article and Find Full Text PDFBackground: Irritable bowel syndrome (IBS) significantly impacts individuals due to its prevalence and negative effect on quality of life. Current genome-wide association studies (GWAS) have only identified a small number of crucial single nucleotide polymorphisms (SNPs), not fully elucidating IBS's pathogenesis.
Objective: To identify genomic loci at which common genetic variation influences IBS susceptibility.
Transcriptional regulation, critical for cellular differentiation and adaptation to environmental changes, involves coordinated interactions among DNA sequences, regulatory proteins, and chromatin architecture. Despite extensive data from consortia like ENCODE, understanding the dynamics of cis-regulatory elements (CREs) in gene expression remains challenging. Deep learning is a powerful tool for learning gene expression and epigenomic signals from DNA sequences, exhibiting superior performance compared to conventional machine learning approaches.
View Article and Find Full Text PDFCell Biosci
August 2024
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder for which current treatments are limited and drug development costs are prohibitive. Identifying drug targets for ASD is crucial for the development of targeted therapies. Summary-level data of expression quantitative trait loci obtained from GTEx, protein quantitative trait loci data from the ROSMAP project, and two ASD genome-wide association studies datasets were utilized for discovery and replication.
View Article and Find Full Text PDFInsights Imaging
June 2024
Objectives: The clinical decision-making regarding choosing surgery alone (SA) or surgery followed by postoperative adjuvant chemotherapy (SPOCT) in esophageal squamous cell carcinoma (ESCC) remains controversial. We aim to propose a pre-therapy PET/CT image-based deep learning approach to improve the survival benefit and clinical management of ESCC patients.
Methods: This retrospective multicenter study included 837 ESCC patients from three institutions.
Int J Surg
September 2024
Background: Sleep problems are prevalent. However, the impact of sleep patterns on digestive diseases remains uncertain. Moreover, the interaction between sleep patterns and genetic predisposition with digestive diseases has not been comprehensively explored.
View Article and Find Full Text PDFAims: Cardiovascular disease (CVD) is a leading cause of mortality, especially in developing countries. This study aimed to develop and validate a CVD risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (P-CARDIAC), for recurrent cardiovascular events using machine learning technique.
Methods And Results: Three cohorts of Chinese patients with established CVD were included if they had used any of the public healthcare services provided by the Hong Kong Hospital Authority (HA) since 2004 and categorized by their geographical locations.
Background & Aims: Inflammatory bowel disease (IBD) is commonly associated with extraintestinal complications, including autoimmune liver disease. The co-occurrence of IBD and primary biliary cholangitis (PBC) has been increasingly observed, but the underlying relationship between these conditions remains unclear.
Methods: Using summary statistics from genome-wide association studies (GWAS), we investigated the causal effects between PBC and IBD, including Crohn's disease (CD) and ulcerative colitis (UC).
Background: Emerging evidence suggests that Rho GTPases play a crucial role in tumorigenesis and metastasis, but their involvement in the tumor microenvironment (TME) and prognosis of hepatocellular carcinoma (HCC) is not well understood.
Methods: We aim to develop a tumor prognosis prediction system called the Rho GTPases-related gene score (RGPRG score) using Rho GTPase signaling genes and further bioinformatic analyses.
Results: Our work found that HCC patients with a high RGPRG score had significantly worse survival and increased immunosuppressive cell fractions compared to those with a low RGPRG score.
Summary: Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users.
View Article and Find Full Text PDFStem Cell Res Ther
September 2023
Aims: Dissecting complex interactions among transcription factors (TFs), microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) are central for understanding heart development and function. Although computational approaches and platforms have been described to infer relationships among regulatory factors and genes, current approaches do not adequately account for how highly diverse, interacting regulators that include noncoding RNAs (ncRNAs) control cardiac gene expression dynamics over time.
Methods: To overcome this limitation, we devised an integrated framework, cardiac gene regulatory modeling (CGRM) that integrates LogicTRN and regulatory component analysis bioinformatics modeling platforms to infer complex regulatory mechanisms.
Background: HIV infections often develop drug resistance mutations (DRMs), which can increase the risk of virological failure. However, it has been difficult to determine if minor mutations occur in the same genome or in different virions using Sanger sequencing and short-read sequencing methods. Oxford Nanopore Technologies (ONT) sequencing may improve antiretroviral resistance profiling by allowing for long-read clustering.
View Article and Find Full Text PDFBMC Bioinformatics
August 2023
Background: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data.
Results: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform).