Nat Genet
September 2025
Gene expression is modulated jointly by transcriptional regulation and messenger RNA stability, yet the latter is often overlooked in studies on genetic variants. Here, leveraging metabolic labeling data (Bru/BruChase-seq) and a new computational pipeline, RNAtracker, we categorize genes as allele-specific RNA stability (asRS) or allele-specific RNA transcription events. We identify more than 5,000 asRS variants among 665 genes across a panel of 11 human cell lines.
View Article and Find Full Text PDFThe false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies between features, FDR correction methods like BH can sometimes counter-intuitively report very high numbers of false positives, potentially misleading researchers. We call the attention of researchers to use suited multiple testing strategies and approaches like synthetic null data (negative control) to identify and minimize caveats related to false discoveries, as in the cases where false findings do occur, they may be numerous.
View Article and Find Full Text PDFA central task in expression quantitative trait locus analysis is to identify cis-eGenes, i.e., genes whose expression levels are regulated by at least one local genetic variant.
View Article and Find Full Text PDFTranscriptomes provide highly informative molecular phenotypes that, combined with gene perturbation, can connect genotype to phenotype. An ultimate goal is to perturb every gene and measure transcriptome changes, however, this is challenging, especially in whole animals. Here, we present 'Worm Perturb-Seq (WPS)', a method that provides high-resolution RNA-sequencing profiles for hundreds of replicate perturbations at a time in living animals.
View Article and Find Full Text PDFNat Genet
April 2025
Genetic mutation and drift, coupled with natural and human-mediated selection and migration, have produced a wide variety of genotypes and phenotypes in farmed animals. We here introduce the Farm Animal Genotype-Tissue Expression (FarmGTEx) Project, which aims to elucidate the genetic determinants of gene expression across 16 terrestrial and aquatic domestic species under diverse biological and environmental contexts. For each species, we aim to collect multiomics data, particularly genomics and transcriptomics, from 50 tissues of 1,000 healthy adults and 200 additional animals representing a specific context.
View Article and Find Full Text PDFNat Cell Biol
March 2025
Understanding how cells respond differently to perturbation is crucial in cell biology, but existing methods often fail to accurately quantify and interpret heterogeneous single-cell responses. Here we introduce the perturbation-response score (PS), a method to quantify diverse perturbation responses at a single-cell level. Applied to single-cell perturbation datasets such as Perturb-seq, PS outperforms existing methods in quantifying partial gene perturbations.
View Article and Find Full Text PDFThe transcriptome provides a highly informative molecular phenotype to connect genotype to phenotype and is most frequently measured by RNA-sequencing (RNA-seq). Therefore, an ultimate goal is to perturb every gene and measure changes in the transcriptome. However, this remains challenging, especially in intact organisms due to different experimental and computational challenges.
View Article and Find Full Text PDFHigh-throughput sequencing data lie at the heart of modern microbiome research. Effective analysis of these data requires careful preprocessing, modeling, and interpretation to detect subtle signals and avoid spurious associations. In this review, we discuss how simulation can serve as a sandbox to test candidate approaches, creating a setting that mimics real data while providing ground truth.
View Article and Find Full Text PDFIn the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 34 state-of-the-art methods, classifying SVGs into three categories: overall, cell-type-specific, and spatial-domain-marker SVGs.
View Article and Find Full Text PDFGenome Biol
December 2024
Background: Plasma cell-free DNA (cfDNA) is derived from cellular death in various tissues. Investigating the tissue origin of cfDNA through cell type deconvolution, we can detect changes in tissue homeostasis that occur during disease progression or in response to treatment. Consequently, cfDNA has emerged as a valuable noninvasive biomarker for disease detection and treatment monitoring.
View Article and Find Full Text PDFMotivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize the squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies the line memberships. We construct the generalized Pearson correlation squares by focusing on three aspects: variable exchangeability, no parametric model assumptions, and inference of population-level parameters. To compute the generalized Pearson correlation square from a sample without a line-membership specification, we develop a -lines clustering algorithm to find clusters that exhibit distinct linear dependences, where can be chosen in a data-adaptive way.
View Article and Find Full Text PDFNat Commun
November 2024
Genomic profiling often fails to predict therapeutic outcomes in cancer. This failure is, in part, due to a myriad of genetic alterations and the plasticity of cancer signaling networks. Functional profiling, which ascertains signaling dynamics, is an alternative method to anticipate drug responses.
View Article and Find Full Text PDFTwo correspondences raised concerns or comments about our analyses regarding exaggerated false positives found by differential expression (DE) methods. Here, we discuss the points they raise and explain why we agree or disagree with these points. We add new analysis to confirm that the Wilcoxon rank-sum test remains the most robust method compared to the other five DE methods (DESeq2, edgeR, limma-voom, dearseq, and NOISeq) in two-condition DE analyses after considering normalization and winsorization, the data preprocessing steps discussed in the two correspondences.
View Article and Find Full Text PDFGenetic variation can alter brain structure and, consequently, function. Comparative statistical analysis of mouse brains across genetic backgrounds requires spatial, single-cell, atlas-scale data, in replicates-a challenge for current technologies. We introduce tlas-scale ranscriptome ocalization using ggregate ignatures (ATLAS), a scalable tissue mapping method.
View Article and Find Full Text PDFGenomics Proteomics Bioinformatics
July 2024
Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs.
View Article and Find Full Text PDFIn the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 33 state-of-the-art methods, categorizing SVGs into three types: overall, cell-type-specific, and spatial-domain-marker SVGs.
View Article and Find Full Text PDFIn droplet-based single-cell and single-nucleus RNA-seq assays, systematic contamination of ambient RNA molecules biases the quantification of gene expression levels. Existing methods correct the contamination for all genes globally. However, there lacks specific evaluation of correction efficacy for varying contamination levels.
View Article and Find Full Text PDFRNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.
View Article and Find Full Text PDFCircadian clock genes are emerging targets in many types of cancer, but their mechanistic contributions to tumor progression are still largely unknown. This makes it challenging to stratify patient populations and develop corresponding treatments. In this work, we show that in breast cancer, the disrupted expression of circadian genes has the potential to serve as biomarkers.
View Article and Find Full Text PDFThe Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.
View Article and Find Full Text PDFTwo-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method.
View Article and Find Full Text PDFSpatially resolved transcriptomics offers unprecedented insight by enabling the profiling of gene expression within the intact spatial context of cells, effectively adding a new and essential dimension to data interpretation. To efficiently detect spatial structure of interest, an essential step in analyzing such data involves identifying spatially variable genes. Despite researchers having developed several computational methods to accomplish this task, the lack of a comprehensive benchmark evaluating their performance remains a considerable gap in the field.
View Article and Find Full Text PDFBenchmarking single-cell RNA-seq (scRNA-seq) and single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) computational tools demands simulators to generate realistic sequencing reads. However, none of the few read simulators aim to mimic real data. To fill this gap, we introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in a FASTQ or BAM file) by mimicking real data.
View Article and Find Full Text PDF