Brief Bioinform
September 2024
We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods.
View Article and Find Full Text PDFCellular deconvolution aims to estimate cell type fractions from bulk transcriptomic and other omics data. Most existing deconvolution methods fail to account for the heterogeneity in cell type-specific (CTS) expression across bulk samples, ignore discrepancies between CTS expression in bulk and cell type reference data, and provide no guidance on cell type reference selection or integration. To address these issues, we introduce BLEND, a hierarchical Bayesian method that leverages multiple reference datasets.
View Article and Find Full Text PDFCommun Biol
January 2024
The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references.
View Article and Find Full Text PDFBackground: Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging.
View Article and Find Full Text PDFThe proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references.
View Article and Find Full Text PDFAlzheimers Dement
January 2024
Introduction: Our previously developed blood-based transcriptional risk scores (TRS) showed associations with diagnosis and neuroimaging biomarkers for Alzheimer's disease (AD). Here, we developed brain-based TRS.
Methods: We integrated AD genome-wide association study summary and expression quantitative trait locus data to prioritize target genes using Mendelian randomization.
Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, cellular deconvolution methods have been developed as an alternative.
View Article and Find Full Text PDFPosttranscriptional RNA modifications by adenosine-to-inosine (A-to-I) editing are abundant in the brain, yet elucidating functional sites remains challenging. To bridge this gap, we investigate spatiotemporal and genetically regulated A-to-I editing sites across prenatal and postnatal stages of human brain development. More than 10,000 spatiotemporally regulated A-to-I sites were identified that occur predominately in 3' UTRs and introns, as well as 37 sites that recode amino acids in protein coding regions with precise changes in editing levels across development.
View Article and Find Full Text PDFJ Expo Sci Environ Epidemiol
March 2023
Background: Phthalate exposure in pregnancy is typically estimated using maternal urinary phthalate metabolite levels. Our aim was to evaluate the association of urinary and placental tissue phthalates, and to explore the role of maternal and pregnancy characteristics that may bias estimates.
Methods: Fifty pregnancies were selected from the CANDLE Study, recruited from 2006 to 2011 in Tennessee.
Understanding lung immunity requires an unbiased profiling of tissue-resident T cells at their precise anatomical locations within the lung, but such information has not been characterized in the immunized mouse model. In this pilot study, using 10x Genomics Chromium and Visium platform, we performed an integrative analysis of spatial transcriptome with single-cell RNA-seq and single-cell ATAC-seq on lung cells from mice after immunization using a well-established infection model. We built an optimized deconvolution pipeline to accurately decipher specific cell-type compositions by anatomic location.
View Article and Find Full Text PDFTransl Psychiatry
August 2022
DNA methylation (DNAm), the addition of a methyl group to a cytosine in DNA, plays an important role in the regulation of gene expression. Single-nucleotide polymorphisms (SNPs) associated with schizophrenia (SZ) by genome-wide association studies (GWAS) often influence local DNAm levels. Thus, DNAm alterations, acting through effects on gene expression, represent one potential mechanism by which SZ-associated SNPs confer risk.
View Article and Find Full Text PDFMotivation: Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings.
View Article and Find Full Text PDFThe clinical diagnosis of Alzheimer's disease, at its early stage, remains a difficult task. Advanced imaging technologies and laboratory assays to detect Aβ peptides Aβ42 and Aβ40, total and phosphorylated tau in CSF provide a set of biomarkers of developing AD brain pathology and facilitate the diagnostic process. The search for biofluid biomarkers, other than in CSF, and the development of biomarker assays have accelerated significantly and now represent the fastest-growing field in AD research.
View Article and Find Full Text PDFBackground: Prevalence rates of opioid use disorder (OUD) have increased dramatically, accompanied by a surge of overdose deaths. While opioid dependence has been extensively studied in preclinical models, an understanding of the biological alterations that occur in the brains of people who chronically use opioids and who are diagnosed with OUD remains limited. To address this limitation, RNA sequencing was conducted on the dorsolateral prefrontal cortex and nucleus accumbens, regions heavily implicated in OUD, from postmortem brains in subjects with OUD.
View Article and Find Full Text PDFJ Autism Dev Disord
August 2022
Little is known on the financial well-being of families raising children with autism spectrum disorders (ASD). Family financial well-being has important impacts on the development of children with ASD. The study uses a 2019 survey collected from Chinese families raising a child with ASD (N = 3064) to examine their financial well-being and its association with health expenditures for children.
View Article and Find Full Text PDFBioinformatics
October 2021
Motivation: Marker genes, defined as genes that are expressed primarily in a single-cell type, can be identified from the single-cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern.
View Article and Find Full Text PDFWhen assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by evaluating gene expression at the cellular level. Both data types lend insights into disease etiology.
View Article and Find Full Text PDFObsessive-compulsive disorder (OCD) is a chronic and severe psychiatric disorder for which effective treatment options are limited. Structural and functional neuroimaging studies have consistently implicated the orbitofrontal cortex (OFC) and striatum in the pathophysiology of the disorder. Recent genetic evidence points to involvement of components of the excitatory synapse in the etiology of OCD.
View Article and Find Full Text PDFBioinformatics
September 2021
Motivation: Trans-acting expression quantitative trait loci (eQTLs) collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are often individually weak. A large proportion of genetic effects on distal genes are mediated through cis-gene expression. Cis-association (between SNP and cis-gene) and gene-gene correlation conditional on SNP genotype could establish trans-association (between SNP and trans-gene).
View Article and Find Full Text PDFBioinformatics
August 2021
Motivation: Gene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods.
View Article and Find Full Text PDFBackground: Whole-exome sequencing studies have been useful for identifying genes that, when mutated, affect risk for autism spectrum disorder (ASD). Nonetheless, the association signal primarily arises from de novo protein-truncating variants, as opposed to the more common missense variants. Despite their commonness in humans, determining which missense variants affect phenotypes and how remains a challenge.
View Article and Find Full Text PDF