Publications by authors named "Shuai Cheng Li"

Spatial transcriptomics has emerged as a groundbreaking tool for the study of intercellular ligand-receptor interactions (LRIs) that exhibit spatial variability. To identify spatially variable LRIs with activation evidence, we present SPIDER, which constructs cell-cell interaction interfaces constrained by cellular interaction capacity, and profiles and identifies spatially variable interaction (SVI) signals with support from downstream transcript factors via multiple probabilistic models. SPIDER demonstrates superior performance regarding accuracy, specificity, and spatial variance relative to existing methods.

View Article and Find Full Text PDF

Spatial trajectory inference models cell differentiation and state dynamics within tissues by integrating spatial information. Existing spatial trajectory inference methods depend on similarity-based cell graphs constructed from spatial proximity, with less attention to the Markovian property in cell state transitions. In this study, we introduce CASCAT, a tree-shaped structural causal model with the Markovian property integrated to infer a unique cell differentiation trajectory, addressing challenges posed by Markov equivalence in high-dimensional and nonlinear data.

View Article and Find Full Text PDF

Motivation: B-cell lineage trees describe the evolutionary process of immunoglobulin genes during affinity maturation. Existing methods for building B-cell lineage trees generally do not guarantee the parent-to-child inheritance and accumulation of advantageous mutations under successive rounds of somatic hypermutation (SHM) and selection, and are often incompatible with repertoire input.

Results: To address previous limitations, we developed AffMB (Affinity Maturation of B-cell receptor), a comprehensive toolkit for tracking affinity maturation through the generation and visualization of SHM-ordered, inheritance-based B-cell lineage trees from single-cell or bulk B-cell receptor sequencing data.

View Article and Find Full Text PDF

Entropy quantifies the limits of information compression and provides a theoretical foundation for exploring complex structures in large-scale graphs. However, effective metrics are needed to capture the intricate structural details in biological graphs. In this paper, we introduce the to quantify the complexity of biological graphs and show that minimizing the associated entropy is equivalent to optimal graph partitioning.

View Article and Find Full Text PDF

Advances in single-cell RNA sequencing (scRNA-seq) enable detailed analysis of cellular heterogeneity, but existing clustering methods often fail to capture the complex hierarchical structures of cell types and subtypes. CeiTEA is introduced, a novel algorithm for adaptive hierarchical clustering based on topological entropy (TE), designed to address this challenge. CeiTEA constructs a multi-nary partition tree that optimally represents relationships and diversity among cell types by minimizing TE.

View Article and Find Full Text PDF

The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis.

View Article and Find Full Text PDF

The human respiratory microbiome plays a crucial role in respiratory health, but there is no comprehensive respiratory genome catalogue (RGC) for studying the microbiome. In this study, we collected whole-metagenome shotgun sequencing data from 4067 samples and sequenced long reads of 124 samples, yielding 9.08 and 0.

View Article and Find Full Text PDF

Metagenomic studies have revealed the critical roles of complex microbial interactions, including horizontal gene transfer (HGT) and functional redundancy (FR), in shaping the gut microbiome's functional capacity and resilience. However, the lack of comprehensive data integration and systematic analysis approaches has limited the in-depth exploration of HGT and FR dynamics across large-scale gut microbiome datasets. To address this gap, we present GutMetaNet (https://gutmetanet.

View Article and Find Full Text PDF
Article Synopsis
  • TCRclub is a new method that combines single-cell RNA and TCR sequencing to group similar T cells, called 'clubs'
  • It has shown superior performance in clustering T cells, especially in a large dataset with over 400 verified peptide-MHC combinations
  • TCRclub has also provided insights into T cell behavior in various conditions, revealing transitions in T cell states and potential pathways for improving cancer therapies, as well as analyzing responses in COVID-19 patients.
View Article and Find Full Text PDF

Plasmids are extrachromosomal genetic molecules that replicate independent of chromosomes in bacteria, archaea, and eukaryotic organisms. They contain diverse functional elements and are capable of horizontal gene transfer among hosts. While existing plasmid databases have archived plasmid sequences isolated from individual microorganisms or natural environments, there is a need for a comprehensive, standardized, and annotated plasmid database to address the vast accumulation of plasmid sequences.

View Article and Find Full Text PDF
Article Synopsis
  • Scientists are using a new method called STAGUE to study how cells interact in space by creating special graphs from data.
  • This method helps them better understand how different cells work together, especially in human breast cancer tissues.
  • STAGUE is better than older methods because it finds new patterns in the data and reveals important information about genes that help cells communicate.
View Article and Find Full Text PDF

High-throughput chromosome conformation capture (Hi-C) technology captures spatial interactions of DNA sequences into matrices, and software tools are developed to identify topologically associating domains (TADs) from the Hi-C matrices. With structural information theory, SuperTAD adopted a dynamic programming approach to find the TAD hierarchy with minimal structural entropy. However, the algorithm suffers from high time complexity.

View Article and Find Full Text PDF

Horizontal gene transfer (HGT) phenomena pervade the gut microbiome and significantly impact human health. Yet, no current method can accurately identify complete HGT events, including the transferred sequence and the associated deletion and insertion breakpoints from shotgun metagenomic data. Here, we develop LocalHGT, which facilitates the reliable and swift detection of complete HGT events from shotgun metagenomic data, delivering an accuracy of 99.

View Article and Find Full Text PDF

Esophageal squamous cell carcinoma (ESCC) is a poor-prognostic cancer type with extensive intra- and inter-patient heterogeneity in both genomic variations and tumor microenvironment (TME). However, the patterns and drivers of spatial genomic and microenvironmental heterogeneity of ESCC remain largely unknown. Here, we generated a spatial multi-omic atlas by whole-exome, transcriptome, and methylome sequencing of 507 tumor samples from 103 patients.

View Article and Find Full Text PDF

Motivation: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network.

View Article and Find Full Text PDF

Efficient translation mediated by the 5' untranslated region (5' UTR) is essential for the robust efficacy of mRNA vaccines. However, the 1-methyl-pseudouridine (m1) modification of mRNA can impact the translation efficiency of the 5' UTR. We discovered that the optimal 5' UTR for m1-modified mRNA (m15' UTR) differs significantly from its unmodified counterpart, highlighting the need for a specialized tool for designing m15' UTRs rather than directly utilizing high-expression endogenous gene 5' UTRs.

View Article and Find Full Text PDF

Numerous studies have shown that immune checkpoint inhibitor (ICI) immunotherapy has great potential as a cancer treatment, leading to significant clinical improvements in numerous cases. However, it benefits a minority of patients, underscoring the importance of discovering reliable biomarkers that can be used to screen for potential beneficiaries and ultimately reduce the risk of overtreatment. Our comprehensive review focuses on the latest advancements in predictive biomarkers for ICI therapy, particularly emphasizing those that enhance the efficacy of programmed cell death protein 1 (PD-1)/programmed cell death-ligand 1 (PD-L1) inhibitors and cytotoxic T-lymphocyte antigen-4 (CTLA-4) inhibitors immunotherapies.

View Article and Find Full Text PDF
Article Synopsis
  • There is a lack of population-based research on how HPV infection affects the vaginal environment, which can influence the risk of long-term HPV infections.
  • The study aims to explore the relationship between vaginal microbiota and vaginal metabolites in response to changes in HPV infection status.
  • Findings suggest that analyzing the vaginal metabolome could be a more effective way to assess the effects of HPV infection on the vaginal microenvironment than looking at vaginal microbiota alone.
View Article and Find Full Text PDF

The common loci represent a distinct set of the human genome sites that harbor genetic variants found in at least 1% of the population. Small somatic mutations occur at the common loci and non-common loci, i.e.

View Article and Find Full Text PDF

Given the shortage of cytologists, women in low-resource regions had inequitable access to cervical cytology which plays an pivotal role in cervical cancer screening. Emerging studies indicated the potential of AI-assisted system in promoting the implementation of cytology in resource-limited settings. However, there is a deficiency in evaluating the aid of AI in the improvement of cytologists' work efficiency.

View Article and Find Full Text PDF

Bacteriophages are viruses that infect bacteria or archaea. Understanding the diverse and intricate genomic architectures of phages is essential to study microbial ecosystems and develop phage therapy strategies. However, the existing phage databases are short of meticulous annotations.

View Article and Find Full Text PDF

Reconstructing diploid sequences of human leukocyte antigen (HLA) genes, i.e., full-resolution HLA typing, from sequencing data is challenging.

View Article and Find Full Text PDF

Breakage-fusion-bridge (BFB) is a complex rearrangement that leads to tumor malignancy. Existing models for detecting BFBs rely on the ideal BFB hypothesis, ruling out the possibility of BFBs entangled with other structural variations, that is, complex BFBs. We propose an algorithm Ambigram to identify complex BFB and reconstruct the rearranged structure of the local genome during the cancer subclone evolution process.

View Article and Find Full Text PDF