Publications by authors named "Hongru Shen"

Tumor-type classification is critical for effective cancer treatment, yet current methods based on genomic alterations lack flexibility and have limited performance. Here, we introduce OncoChat, an artificial intelligence (AI) model designed to classify 69 tumor types by integrating diverse genomic alterations. Developed on genomic data from 158,836 tumors sequenced with targeted cancer gene panels, OncoChat demonstrates superior performance, achieving a micro-averaged precision-recall area under the curve (PRAUC) of 0.

View Article and Find Full Text PDF

Atmospheric nanoplastic particles (NPPs) are an emerging environmental concern due to their potential adverse effects on human and ecosystem health. Many recently identified sources involve subjecting plastic materials to elevated temperatures; however, fundamental understanding of airborne emissions is limited. This study is the first systematic characterization of particle and volatile organic compound emissions from plastic smoldering combustion.

View Article and Find Full Text PDF

Protein-RNA interactions play pivotal roles in regulating transcription, translation, and RNA metabolism. Characterizing these interactions offers key insights into RNA dysregulation mechanisms. Here, we introduce Reformer, a deep learning model that predicts protein-RNA binding affinity from sequence data.

View Article and Find Full Text PDF

Deep learning has revolutionized cancer diagnostics, shifting from pixel-based image analysis to more comprehensive, patient-centric care. This opinion article explores recent advancements in neural network architectures, highlighting their evolution in biomedical research and their impact on medical imaging interpretation and multimodal data integration. We emphasize the need for domain-specific artificial intelligence (AI) systems capable of handling complex clinical tasks, advocating for the development of multimodal large language models that can integrate diverse data sources.

View Article and Find Full Text PDF

Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification.

View Article and Find Full Text PDF

Instruction-tuned large language models (LLMs) demonstrate exceptional ability to align with human intentions. We present an LLM-based model-instruction-tuned LLM for assessment of cancer (iLLMAC)-that can detect cancer using cell-free deoxyribonucleic acid (cfDNA) end-motif profiles. Developed on plasma cfDNA sequencing data from 1135 cancer patients and 1106 controls across three datasets, iLLMAC achieved area under the receiver operating curve (AUROC) of 0.

View Article and Find Full Text PDF

Accurate discrimination between patients with and without cancer from cfDNA is crucial for early cancer diagnosis. Herein, we develop and validate a deep-learning-based model entitled end-motif inspection via transformer (EMIT) for discriminating individuals with and without cancer by learning feature representations from cfDNA end-motifs. EMIT is a self-supervised learning approach that models rankings of cfDNA end-motifs.

View Article and Find Full Text PDF

Immune checkpoint inhibitors (ICIs) represent a promising treatment for hepatocellular carcinoma (HCC) due to their capacity for abundant lymphocyte infiltration. However, some patients with HCC respond poorly to ICI therapy due to the presence of various immunosuppressive factors in the tumor microenvironment. Our research reveals that a macrophage-coated tumor cluster (MCTC) signifies a unique spatial structural organization in HCC correlating with diminished recurrence-free survival and overall survival in a total of 572 HCC cases from 3 internal cohorts and 2 independent external validation cohorts.

View Article and Find Full Text PDF

We present a language model Affordable Cancer Interception and Diagnostics (ACID) that can achieve high classification performance in the diagnosis of cancer exclusively from using raw cfDNA sequencing reads. We formulate ACID as an autoregressive language model. ACID is pretrained with language sentences that are obtained from concatenation of raw sequencing reads and diagnostic labels.

View Article and Find Full Text PDF
Article Synopsis
  • The text discusses a new method called WSI inspection via transformer (WIT) for analyzing gigapixel whole-slide images (WSIs) to aid in cancer diagnosis.
  • WIT improves slide-level classification by effectively modeling the relationships between different image patches, achieving notable accuracy in detecting various cancer types.
  • The method outperforms existing benchmarks significantly and has the ability to identify key regions in WSIs that influence diagnostic decisions, marking a promising shift in computational pathology.
View Article and Find Full Text PDF

Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretraining from transcriptomes () for learning feature representation of transcriptomes. is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors.

View Article and Find Full Text PDF

Acral melanoma is a dismal subtype of melanoma occurring in glabrous acral skin, and has a higher incidence in East Asians. We perform single-cell RNA sequencing for 63,394 cells obtained from 5 acral and 3 cutaneous melanoma samples to investigate tumor heterogeneity and immune environment. We define 5 orthogonal functional cell clusters that are involved in TGF-beta signaling, Type I interferon, Wnt signaling, Cell cycle, and Cholesterol efflux signaling.

View Article and Find Full Text PDF
Article Synopsis
  • Secondary organic aerosol (SOA) significantly impacts air quality and climate, primarily formed from the oxidation of volatile organic compounds like biogenic monoterpenes.
  • A study reveals that for α-pinene, the most common monoterpene, hydrogen abstraction by hydroxyl radicals is the key pathway for forming highly oxygenated organic molecules (HOMs), contrary to previous assumptions.
  • The findings indicate that this minor reaction pathway is crucial for rapid HOM formation during the day, suggesting its importance for SOA growth and its subsequent effects on air quality and climate.
View Article and Find Full Text PDF

Objectives: The postoperative early recurrence (ER) rate of hepatocellular carcinoma (HCC) is 50%, and no highly reliable predictive tool has been developed yet. The aim of this study was to develop and validate a predictive model with radiomics analysis based on multiparametric magnetic resonance (MR) images to predict early recurrence of HCC.

Methods: In total, 302 patients (training dataset:  = 211; validation dataset:  = 91) with pathologically confirmed HCC who underwent preoperative MR imaging were enrolled in this study.

View Article and Find Full Text PDF

The comprehensive regulation effect of eRNA on tumor immune cell infiltration and the outcome remains obscure. We comprehensively identify the eRNA-mediated immune infiltration patterns of gastric cancer (GC) samples. We creatively proposed a random forest machine-learning (ML) algorithm to map eRNA to mRNA expression patterns.

View Article and Find Full Text PDF

Integration of accumulative large-scale single-cell transcriptomes requires scalable batch-correction approaches. Here we propose Fugue, a simple and efficient batch-correction method that is scalable for integrating super large-scale single-cell transcriptomes from diverse sources. The core idea of the method is to encode batch information as trainable parameters and add it to single-cell expression profile; subsequently, a contrastive learning approach is used to learn feature representation of the additive expression profile.

View Article and Find Full Text PDF

TP53 mutations correlate with inferior survival in many cancers. APR-246 is a compound to shift mutant p53 and exhibits anti-cancer effects. Among its effects, APR-246 facilitates the binding of restored p53 mutants to target genes and their transcription.

View Article and Find Full Text PDF

Hashimoto's thyroiditis (HT) is the main cause of hypothyroidism. We develop a deep learning model called HTNet for diagnosis of HT by training on 106,513 thyroid ultrasound images from 17,934 patients and test its performance on 5051 patients from 2 datasets of static images and 1 dataset of video data. HTNet achieves an area under the receiver operating curve (AUC) of 0.

View Article and Find Full Text PDF

Gastric cancer is the fifth most common type of human cancer and the third leading cause of cancer-related death. The purpose of this study is to investigate the immune infiltration signatures of gastric cancer and their relation to prognosis. We identified two distinct subtypes of gastric cancer (C1/C2) characterized by different immune infiltration signatures.

View Article and Find Full Text PDF

Advancement in single-cell RNA sequencing leads to exponential accumulation of single-cell expression data. However, there is still lack of tools that could integrate these unlimited accumulations of single-cell expression data. Here, we presented a universal approach iSEEEK for integrating super large-scale single-cell expression via exploring expression rankings of top-expressing genes.

View Article and Find Full Text PDF

The reactions of biogenic volatile organic compounds (BVOC) with the nitrate radicals (NO) are major night-time sources of organic nitrates and secondary organic aerosols (SOA) in regions influenced by BVOC and anthropogenic emissions. In this study, the formation of gas-phase highly oxygenated organic molecules-organic nitrates (HOM-ON) from NO-initiated oxidation of a representative monoterpene, β-pinene, was investigated in the SAPHIR chamber (Simulation of Atmosphere PHotochemistry In a large Reaction chamber). Six monomer (C = 7-10, N = 1-2, O = 6-16) and five accretion product (C = 17-20, N = 2-4, O = 9-22) families were identified and further classified into first- or second-generation products based on their temporal behavior.

View Article and Find Full Text PDF

Background: Brain tumor ranks as the most devastating cancer type. The complex tumor immune microenvironment prevents brain tumor from receiving therapeutic benefits. The purpose of this study was to stratify brain tumors based on their distinct immune infiltration signatures to facilitate better clinical decision making and prognosis prediction.

View Article and Find Full Text PDF

We developed Miscell, a self-supervised learning approach with deep neural network as latent feature encoder for mining information from single-cell transcriptomes. We demonstrated the capability of Miscell with canonical single-cell analysis tasks including delineation of single-cell clusters and identification of cluster-specific marker genes. We evaluated Miscell along with three state-of-the-art methods on three heterogeneous datasets.

View Article and Find Full Text PDF

Background And Aims: Computed tomography (CT) scan is frequently used to detect hepatocellular carcinoma (HCC) in routine clinical practice. The aim of this study is to develop a deep-learning AI system to improve the diagnostic accuracy of HCC by analysing liver CT imaging data.

Methods: We developed a deep-learning AI system by training on CT images from 7512 patients at Henan Provincial Peoples' Hospital.

View Article and Find Full Text PDF