Viral infections and cancers are driven by evolution of populations of highly mutable genomic variants. A key evolutionary process in these populations is their migration or spread via transmission or metastasis. Understanding this process is crucial for research, clinical practice, and public health, yet tracing spread pathways is challenging.
View Article and Find Full Text PDFBioinformatics
July 2025
Motivation: Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms.
View Article and Find Full Text PDFBioinformatics
July 2025
Motivation: Perturbations in biological tissues-e.g. due to inflammation, disease, or drug treatment-alter the composition of cell types and cell states in the tissue.
View Article and Find Full Text PDFMotivation: Gene expression varies across a tissue due to both the organization of the tissue into spatial domains, i.e. discrete regions of a tissue with distinct cell type composition, and continuous spatial gradients of gene expression within different spatial domains.
View Article and Find Full Text PDFUnlabelled: Castration-resistant prostate cancer (CRPC) is an aggressive disease exhibiting multiple epigenomic subtypes: androgen receptor-dependent CRPC-AR, and lineage plastic subtypes CRPC-SCL (stem cell-like), CRPC-WNT (Wnt-dependent), and CRPC-NE (neuroendocrine). By transcriptomic profiling of tissue, and whole-genome sequencing (WGS) of tissue and cell-free DNA (cfDNA) from 500 samples, we relate genomic variants with epigenomic state. We find lineage plasticity is associated with higher epigenomic and genomic heterogeneity.
View Article and Find Full Text PDFDynamic lineage tracing technologies combine genome editing with single-cell sequencing to track cell divisions. We introduce Lineage Analysis via Maximum Likelihood (LAML) to infer a maximum likelihood time-resolved cell lineage tree under the Probabilistic Mixed-type Missing model, which we derive to describe key features of dynamic lineage tracing systems. LAML produces accurate tree topologies with branch lengths representing experimental time between ancestral cells.
View Article and Find Full Text PDFCancer Res
August 2025
Digitized healthcare data, high-throughput profiling technologies, and data repositories have facilitated the emergence of a new era of cancer research. Each data stream requires specialized analysis methods for interpretation. The data-driven era of cancer research requires the development, enhancement, and sustainment of informatics technology software infrastructure, including fundamental methodology development in artificial intelligence and data science.
View Article and Find Full Text PDFMotivation: Single-cell RNA sequencing (scRNA-seq) measures the transcriptional state of individual cells, enabling more precise characterization of cell types, cell states, and developmental trajectories. Because of the high dimensionality of scRNA-seq data, a standard first step in scRNA-seq analysis is to perform dimensionality reduction. PCA and many other commonly used dimensionality reduction techniques are unsupervised, meaning that they do not incorporate any prior knowledge of the data being analyzed.
View Article and Find Full Text PDFSpatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location.
View Article and Find Full Text PDFDeriving the sequence of transitions between cell types, or differentiation events, that occur during organismal development is one of the fundamental challenges in developmental biology. Single-cell and spatial sequencing of samples from different developmental timepoints provide data to investigate differentiation but inferring a sequence of differentiation events requires: (1) finding trajectories, or ancestor:descendant relationships, between cells from consecutive timepoints; (2) coarse-graining these trajectories into a or collection of transitions between rather than individual cells. We introduce Hidden-Markov Optimal Transport (HM-OT), an algorithm that simultaneously groups cells into cell types and learns transitions between these cell types from developmental transcriptomics time series.
View Article and Find Full Text PDFMotivation: Reconstructing unobserved ancestral states of a phylogenetic tree provides insight into the history of evolving systems and is one of the fundamental problems in phylogenetics. For a fixed phylogenetic tree, the most parsimonious ancestral reconstruction - a solution to the small parsimony problem - can be efficiently found using the dynamic programming algorithms of Fitch-Hartigan and Sankoff. Ancestral reconstruction is important in many applications including inferring the routes of metastases in cancer, deriving the transmission history of viruses, determining the direction of cellular differentiation in organismal development, and detecting recombination and horizontal gene transfer in phylogenetic networks.
View Article and Find Full Text PDFEpistasis - the interaction between alleles at different genetic loci - plays a fundamental role in biology. However, several recent approaches quantify epistasis using a chimeric formula that measures deviations from a multiplicative fitness model on an additive scale, thus mixing two scales. Here, we show that for pairwise interactions, the chimeric formula yields a different magnitude but the same sign of epistasis compared to the multiplicative formula that measures both fitness and deviations on a multiplicative scale.
View Article and Find Full Text PDFSpatially resolved transcriptomics (SRT) measures mRNA transcripts at thousands of locations within a tissue slice, revealing spatial variations in gene expression and cell types. SRT has been applied to tissue slices from multiple time points during the development of an organism. We introduce developmental spatiotemporal optimal transport (DeST-OT), a method to align spatiotemporal transcriptomics data using optimal transport (OT).
View Article and Find Full Text PDFNat Methods
February 2025
Spatially resolved transcriptomics technologies provide high-throughput measurements of gene expression in a tissue slice, but the sparsity of these data complicates analysis of spatial gene expression patterns. We address this issue by deriving a topographic map of a tissue slice-analogous to a map of elevation in a landscape-using a quantity called the isodepth. Contours of constant isodepths enclose domains with distinct cell type composition, while gradients of the isodepth indicate spatial directions of maximum change in expression.
View Article and Find Full Text PDFPLoS Comput Biol
December 2024
Motivation: DNA sequencing of multiple bulk samples from a tumor provides the opportunity to investigate tumor heterogeneity and reconstruct a phylogeny of a patient's cancer. However, since bulk DNA sequencing of tumor tissue measures thousands of cells from a heterogeneous mixture of distinct sub-populations, accurate reconstruction of the tumor phylogeny requires simultaneous deconvolution of cancer clones and inference of ancestral relationships, leading to a challenging computational problem. Many existing methods for phylogenetic reconstruction from bulk sequencing data do not scale to large datasets, such as recent datasets containing upwards of ninety samples with dozens of distinct sub-populations.
View Article and Find Full Text PDFTo study the spatial interactions among cancer and non-cancer cells, we here examined a cohort of 131 tumour sections from 78 cases across 6 cancer types by Visium spatial transcriptomics (ST). This was combined with 48 matched single-nucleus RNA sequencing samples and 22 matched co-detection by indexing (CODEX) samples. To describe tumour structures and habitats, we defined 'tumour microregions' as spatially distinct cancer cell clusters separated by stromal components.
View Article and Find Full Text PDFCancer Discov
February 2025
During development, mulitpotent cells differentiate through a hierarchy of increasingly restricted progenitor cell types until they realize specialized cell types. A cell differentiation map describes this hierarchy, and inferring these maps is an active area of research spanning traditional single marker lineage studies to data-driven trajectory inference methods on single-cell RNA-seq data. Recent high-throughput lineage tracing technologies profile lineages and cell types at scale, but current methods to infer cell differentiation maps from these data rely on simple models with restrictive assumptions about the developmental process.
View Article and Find Full Text PDFBioinform Adv
August 2024
Epistasis, or interactions in which alleles at one locus modify the fitness effects of alleles at other loci, plays a fundamental role in genetics, protein evolution, and many other areas of biology. Epistasis is typically quantified by computing the deviation from the expected fitness under an additive or multiplicative model using one of several formulae. However, these formulae are not all equivalent.
View Article and Find Full Text PDFMotivation: Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations.
View Article and Find Full Text PDFMotivation: Cell-cell interactions (CCIs) consist of cells exchanging signals with themselves and neighboring cells by expressing ligand and receptor molecules and play a key role in cellular development, tissue homeostasis, and other critical biological functions. Since direct measurement of CCIs is challenging, multiple methods have been developed to infer CCIs by quantifying correlations between the gene expression of the ligands and receptors that mediate CCIs, originally from bulk RNA-sequencing data and more recently from single-cell or spatially resolved transcriptomics (SRT) data. SRT has a particular advantage over single-cell approaches, since ligand-receptor correlations can be computed between cells or spots that are physically close in the tissue.
View Article and Find Full Text PDFMotivation: Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in nondividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells.
View Article and Find Full Text PDF