Publications by authors named "Michael Eberle"

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms.

View Article and Find Full Text PDF

Understanding the human de novo mutation (DNM) rate requires complete sequence information. Here using five complementary short-read and long-read sequencing technologies, we phased and assembled more than 95% of each diploid human genome in a four-generation, twenty-eight-member family (CEPH 1463). We estimate 98-206 DNMs per transmission, including 74.

View Article and Find Full Text PDF

Motivation: Structural variants (SVs) play an important role in evolutionary and functional genomics but are challenging to characterize. High-accuracy, long-read sequencing can substantially improve SV characterization when coupled with effective calling methods. While state-of-the-art long-read SV callers are highly accurate, further improvements are achievable by systematically modeling local haplotypes during SV discovery and genotyping.

View Article and Find Full Text PDF

Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes.

View Article and Find Full Text PDF

Tandem repeats are a highly polymorphic class of genomic variation that play causal roles in rare diseases but are notoriously difficult to sequence using short-read techniques. Most previous studies profiling tandem repeats genome-wide have reduced the description of each locus to the singular value of the length of the entire repetitive locus. Here we introduce a comprehensive database of 3.

View Article and Find Full Text PDF

Clinical short-read exome and genome sequencing approaches have positively impacted diagnostic testing for rare diseases. Yet, technical limitations associated with short reads challenge their use for the detection of disease-associated variation in complex regions of the genome. Long-read sequencing (LRS) technologies may overcome these challenges, potentially qualifying as a first-tier test for all rare diseases.

View Article and Find Full Text PDF

Pharmacogenomics is central to precision medicine, informing medication safety and efficacy. Pharmacogenomic diplotyping of complex genes requires full-length DNA sequences and detection of structural rearrangements. We introduce StarPhase, a tool that leverages PacBio HiFi sequence data to diplotype 21 CPIC Level A pharmacogenes and provides detailed haplotypes and supporting visualizations for , , and .

View Article and Find Full Text PDF

The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging.

View Article and Find Full Text PDF

Background: Genetic testing for Huntington's disease (HD) was initially usually positive but more recently the negative rate has increased: patients with negative HD tests are described as having HD phenocopy syndromes (HDPC). This study examines their clinical characteristics and investigates the genetic causes of HDPC.

Methods: Clinical data from neurogenetics clinics and HDPC gene-panel data were analysed.

View Article and Find Full Text PDF

Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 single-nucleotide variants (SNVs), 7.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the factors affecting the expansion of tandem repeats, focusing on the FGF14 (GAA)·(TTC) repeat locus in a large sample of 2,530 individuals through advanced sequencing techniques.
  • Researchers discovered a prevalent 5'-flanking variant present in over 70% of alleles, which is linked to nonpathogenic alleles and the ancestral lineage of this genetic marker.
  • This common variant is associated with greater stability of the tandem repeat during inheritance and improved accessibility of chromatin, suggesting a role in preventing pathological expansion.
View Article and Find Full Text PDF

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies.

View Article and Find Full Text PDF

Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive.

View Article and Find Full Text PDF

Short tandem repeats (STRs) are a class of repetitive elements, composed of tandem arrays of 1-6 base pair sequence motifs, that comprise a substantial fraction of the human genome. STR expansions can cause a wide range of neurological and neuromuscular conditions, known as repeat expansion disorders, whose age of onset, severity, penetrance and/or clinical phenotype are influenced by the length of the repeats and their sequence composition. The presence of non-canonical motifs, depending on the type, frequency and position within the repeat tract, can alter clinical outcomes by modifying somatic and intergenerational repeat stability, gene expression and mutant transcript-mediated and/or protein-mediated toxicities.

View Article and Find Full Text PDF

Motivation: In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants.

View Article and Find Full Text PDF
Article Synopsis
  • Tandem repeat (TR) variation is linked to gene expression and rare genetic diseases, and there's a demand for better tools to analyze these variations across genomes.
  • The Tandem Repeat Genotyping Tool (TRGT) is introduced as a computational method designed to determine consensus sequences and methylation levels of TRs using PacBio HiFi sequencing data.
  • TRGT demonstrated high accuracy with a 98.38% Mendelian concordance and successfully identified known repeat expansions and their methylation status in samples, while also providing access to a database of TR sequences and methylation levels from 100 genomes.
View Article and Find Full Text PDF

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers studied the SCA27B (GAA)•(TTC) repeat locus in over 2,500 individuals to understand factors leading to the expansion of tandem repeats.
  • They found a common 17-bp deletion-insertion variation that was present in about 70% of the alleles analyzed.
  • This variation was mostly found on alleles with fewer than 30 GAA repeats and contributed to increased stability during meiosis.
View Article and Find Full Text PDF

Spinal muscular atrophy, a leading cause of early infant death, is caused by bi-allelic mutations of SMN1. Sequence analysis of SMN1 is challenging due to high sequence similarity with its paralog SMN2. Both genes have variable copy numbers across populations.

View Article and Find Full Text PDF

Expansion of a single repetitive DNA sequence, termed a tandem repeat (TR), is known to cause more than 50 diseases. However, repeat expansions are often not explored beyond neurological and neurodegenerative disorders. In some cancers, mutations accumulate in short tracts of TRs, a phenomenon termed microsatellite instability; however, larger repeat expansions have not been systematically analysed in cancer.

View Article and Find Full Text PDF

Adult-onset cerebellar ataxias are a group of neurodegenerative conditions that challenge both genetic discovery and molecular diagnosis. In this study, we identified an intronic (GAA) repeat expansion in fibroblast growth factor 14 (FGF14). Genetic analysis of 95 Australian individuals with adult-onset ataxia identified four (4.

View Article and Find Full Text PDF

Background: Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions.

View Article and Find Full Text PDF

GBA variants carriers are at increased risk of Parkinson's disease (PD) and Lewy body dementia (LBD). The presence of pseudogene GBAP1 predisposes to structural variants, complicating genetic analysis. We present two methods to resolve recombinant alleles and other variants in GBA: Gauchian, a tool for short-read, whole-genome sequencing data analysis, and Oxford Nanopore sequencing after PCR enrichment.

View Article and Find Full Text PDF