Publications by authors named "David Knowles"

Increased availability of whole-genome sequencing (WGS) has facilitated the study of rare variants (RVs) in complex diseases. Multiple RV association tests are available to study the relationship between genotype and phenotype, but most do not fully leverage the availability of variant-level functional annotations. We propose genome-wide rare variant enrichment evaluation (gruyere), an empirical Bayesian framework that complements existing methods by learning global, trait-specific weights for functional annotations to improve variant prioritization.

View Article and Find Full Text PDF

Background: Neuroanatomical variation in individuals with bipolar disorder (BD) has been previously described in observational studies. However, the causal dynamics of these relationships remain unexplored.

Methods: We performed Mendelian Randomization of 297 structural and functional neuroimaging phenotypes from the UK Biobank and BD using GWAS summary statistics.

View Article and Find Full Text PDF

Pre- and post-transcriptional mechanisms, including alternative promoters, termination signals, and splicing, play essential roles in diversifying protein output by generating distinct RNA and protein isoforms. Two major challenges in characterizing the cellular function of alternative isoforms are the lack of experimental methods to specifically and efficiently modulate isoform expression and computational tools for complex experimental design and analysis. To address these gaps, we develop and methodically test an isoform-specific knockdown strategy which pairs the RNA-targeting CRISPR/Cas13d system with guide RNAs that span exon-exon junctions.

View Article and Find Full Text PDF

Unlabelled: Transcript diversity including splicing and alternative 3'end usage is crucial for cellular identity and adaptation, yet its spatial coordination remains poorly understood. Here, we present SPLISOSM (SpatiaL ISOform Statistical Modeling), a computational framework for detecting isoform-resolution patterns from spatial transcriptomics data. SPLISOSM leverages multivariate testing to account for spot- and isoform-level dependencies, demonstrating robust and theoretically grounded performance on sparse data.

View Article and Find Full Text PDF

Long-read sequencing (LRS) has revealed a far greater diversity of RNA isoforms than earlier technologies, increasing the critical need to determine which, and how many, isoforms per gene are biologically meaningful. To define the space of relevant isoforms from LRS, many existing analysis pipelines rely on arbitrary expression cutoffs, but a single threshold cannot accommodate the broad variability in isoform complexity across genes, cell-types, and disease states captured by LRS. To address this, we propose using -an interpretable measure derived from entropy-that quantifies the effective number of isoforms per gene based on the full, unfiltered isoform ratio distribution.

View Article and Find Full Text PDF

Structural integrity for fusion is an integrated multi-disciplinary subject spanning the science of materials, technology, engineering, health monitoring and simulation methods and algorithms for scrutinizing the assurance of reliable fusion reactor performance from the whole plant design phase through operation to decommissioning. Structural integrity is essential for maintaining high standards of public, environmental and investment protection and maximizing economic benefits. While fusion shares many of the structural integrity challenges faced by other industries, it also presents unique complexities.

View Article and Find Full Text PDF

Mosaic chromosomal alterations (mCAs) in blood, a form of clonal hematopoiesis, have been linked to various diseases, but their role in Alzheimer's disease (AD) remains unclear. We analyzed blood whole-genome sequencing (WGS) data from 24,049 individuals in the Alzheimer's Disease Sequencing Project and found that autosomal mCAs were significantly associated with increased AD risk (odds ratio = 1.27; = 1.

View Article and Find Full Text PDF

Pathogenic variants in the neuronal Na/K ATPase transmembrane ion transporter (ATP1A3) cause a spectrum of neurological disorders including alternating hemiplegia of childhood (AHC). The most common de novo pathogenic variants in AHC are p.D801N (∼40 % of patients) and p.

View Article and Find Full Text PDF

Groups of complex diseases, such as coronary heart diseases, neuropsychiatric disorders, and cancers, often display overlapping clinical symptoms and pharmacological treatments. The shared associations of genetic variants across diseases has the potential to explain their underlying biological processes, but this remains poorly understood. To address this, we model the matrix of summary statistics of trait-associated genetic variants as the sum of a low-rank component - representing shared biological processes - and a sparse component, representing unique processes and arbitrarily corrupted or contaminated components.

View Article and Find Full Text PDF

Given the large number of genes significantly associated with risk for neuropsychiatric disorders, a critical unanswered question is the extent to which diverse mutations-sometimes affecting the same gene-will require tailored therapeutic strategies. Here we consider this in the context of rare neuropsychiatric disorder-associated copy number variants (2p16.3) resulting in heterozygous deletions in NRXN1, which encodes a presynaptic cell-adhesion protein that serves as a critical synaptic organizer in the brain.

View Article and Find Full Text PDF

Most genetic risk variants for neurological diseases are located in non-coding regulatory regions, where they may often act as expression quantitative trait loci (eQTLs), modulating gene expression and influencing disease susceptibility. However, eQTL studies in bulk brain tissue or specific cell types lack the resolution to capture the brain's cellular diversity. Single-nucleus RNA sequencing (snRNA-seq) offers high-resolution mapping of eQTLs across diverse brain cell types.

View Article and Find Full Text PDF

Computed tomography plays an ever-increasing role in the management of fractures and dislocations due to its capability in efficiently providing multiplanar reformats and 3-dimensional volume rendered images. It can reveal findings that are occult on plain radiography and therefore allow for more accurate decision making with regard to fracture classification and management. Clinical radiologists play a critical role in facilitating the processing of imaging to provide adequate image reformats in the desired planes, producing 3 dimensional images but most crucially identifying pertinent findings, which will contribute between the selection of nonoperative and operative management and potentially influence surgical technique.

View Article and Find Full Text PDF

Neuroanatomical variation in individuals with bipolar disorder (BD) has been previously described in observational studies. However, the causal dynamics of these relationships remain unexplored. We performed Mendelian Randomization of 297 structural and functional neuroimaging phenotypes from the UK BioBank and BD using genome-wide association study summary statistics.

View Article and Find Full Text PDF

The increasing availability of whole-genome sequencing (WGS) has begun to elucidate the contribution of rare variants (RVs), both coding and non-coding, to complex disease. Multiple RV association tests are available to study the relationship between genotype and phenotype, but most are restricted to per-gene models and do not fully leverage the availability of variant-level functional annotations. We propose Genome-wide Rare Variant EnRichment Evaluation (gruyere), a Bayesian probabilistic model that complements existing methods by learning global, trait-specific weights for functional annotations to improve variant prioritization.

View Article and Find Full Text PDF

This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ).

View Article and Find Full Text PDF

The success of machine learning models relies heavily on effectively representing high-dimensional data. However, ensuring data representations capture human-understandable concepts remains difficult, often requiring the incorporation of prior knowledge and decomposition of data into multiple subspaces. Traditional linear methods fall short in modeling more than one space, while more expressive deep learning approaches lack interpretability.

View Article and Find Full Text PDF

Characterizing cell-cell communication and tracking its variability over time are crucial for understanding the coordination of biological processes mediating normal development, disease progression, and responses to perturbations such as therapies. Existing tools fail to capture time-dependent intercellular interactions and primarily rely on databases compiled from limited contexts. We introduce DIISCO, a Bayesian framework designed to characterize the temporal dynamics of cellular interactions using single-cell RNA-sequencing data from multiple time points.

View Article and Find Full Text PDF

Spatial omics technologies can help identify spatially organized biological processes, but existing computational approaches often overlook structural dependencies in the data. Here, we introduce Smoother, a unified framework that integrates positional information into non-spatial models via modular priors and losses. In simulated and real datasets, Smoother enables accurate data imputation, cell-type deconvolution, and dimensionality reduction with remarkable efficiency.

View Article and Find Full Text PDF

Characterizing cell-cell communication and tracking its variability over time is essential for understanding the coordination of biological processes mediating normal development, progression of disease, or responses to perturbations such as therapies. Existing tools lack the ability to capture time-dependent intercellular interactions, such as those influenced by therapy, and primarily rely on existing databases compiled from limited contexts. We present DIISCO, a Bayesian framework for characterizing the temporal dynamics of cellular interactions using single-cell RNA-sequencing data from multiple time points.

View Article and Find Full Text PDF

Given the large number of genes significantly associated with risk for neuropsychiatric disorders, a critical unanswered question is the extent to which diverse mutations --sometimes impacting the same gene-- will require tailored therapeutic strategies. Here we consider this in the context of rare neuropsychiatric disorder-associated copy number variants (2p16.3) resulting in heterozygous deletions in , a pre-synaptic cell adhesion protein that serves as a critical synaptic organizer in the brain.

View Article and Find Full Text PDF

Inference of directed biological networks is an important but notoriously challenging problem. We introduce , an approach to learning causal networks that leverages large-scale intervention-response data. Applied to 788 genes from the genome-wide perturb-seq dataset, helps elucidate the network architecture of blood traits.

View Article and Find Full Text PDF

Alternative splicing is an essential mechanism for diversifying proteins, in which mature RNA isoforms produce proteins with potentially distinct functions. Two major challenges in characterizing the cellular function of isoforms are the lack of experimental methods to specifically and efficiently modulate isoform expression and computational tools for complex experimental design. To address these gaps, we developed and methodically tested a strategy which pairs the RNA-targeting CRISPR/Cas13d system with guide RNAs that span exon-exon junctions in the mature RNA.

View Article and Find Full Text PDF
Article Synopsis
  • Multi-omics datasets are increasingly popular, creating a need for integration methods to unlock their potential, which is addressed by a new technique called multi-set correlation and factor analysis (MCFA) that aids in analyzing complex genomic data.
  • MCFA was applied to various biological data (methylation, protein, RNA, and metabolite levels) from 614 samples, revealing strong clustering by ancestry without the need for genetic data and highlighting unique technical variations in individual datasets.
  • The study also incorporated genetic data through a genome-wide association study (GWAS), identifying several factors linked to genetic traits and metabolic diseases, thereby setting a groundwork for future research using large multi-modal genomic datasets.
View Article and Find Full Text PDF
Article Synopsis
  • RNA splicing factors often mutate in blood disorders like myelodysplastic syndrome (MDS), affecting how blood cells develop, but the role of these mutations in blood formation is still not fully understood.
  • Researchers used a new method, GoT-Splice, which combines gene profiling and advanced single-cell analysis to study how mutations in a specific splicing factor (SF3B1) influence blood progenitor cells.
  • Their findings showed that SF3B1 mutations lead to abnormal splicing patterns and an increase in specific blood cell types before MDS is clinically evident, highlighting the importance of understanding these mutations in early disease progression.
View Article and Find Full Text PDF

Transcriptome engineering applications in living cells with RNA-targeting CRISPR effectors depend on accurate prediction of on-target activity and off-target avoidance. Here we design and test ~200,000 RfxCas13d guide RNAs targeting essential genes in human cells with systematically designed mismatches and insertions and deletions (indels). We find that mismatches and indels have a position- and context-dependent impact on Cas13d activity, and mismatches that result in G-U wobble pairings are better tolerated than other single-base mismatches.

View Article and Find Full Text PDF