Publications by Nathan D Olson

Publications by authors named "Nathan D Olson"

Page 1 of 2

Correction: Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Jennifer H McDaniel , Vaidehi Patel , Nathan D Olson , Hua-Jun He , Zhiyong He

Sci Data

August 2025

View Article and Find Full Text PDF

The Platinum Pedigree: a long-read benchmark for genetic variants.

Zev Kronenberg , Cillian Nolan , David Porubsky , Tom Mokveld , William J Rowell , Nathan D Olson

Nat Methods

August 2025

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms.

View Article and Find Full Text PDF

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Jennifer H McDaniel , Vaidehi Patel , Nathan D Olson , Hua-Jun He , Zhiyong He

Sci Data

July 2025

The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line and matched normal cells from duodenal and pancreatic tissues. Data for the tumor-normal matched samples comes from seventeen distinct state-of-the-art whole genome measurement technologies, including high depth short and long-read bulk whole genome sequencing (WGS), single cell WGS, Hi-C, and karyotyping.

View Article and Find Full Text PDF

Small variant benchmark from a complete assembly of X and Y chromosomes.

Justin Wagner , Nathan D Olson , Jennifer McDaniel , Lindsay Harris , Brendan J Pinto

Nat Commun

January 2025

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets.

View Article and Find Full Text PDF

A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.

Camille A Daniels , Adetola Abdulkadir , Megan H Cleveland , Jennifer H McDaniel , David Jáspez , Nathan D Olson

bioRxiv

December 2024

Somatic mosaicism is an important cause of disease, but mosaic and somatic variants are often challenging to detect because they exist in only a fraction of cells. To address the need for benchmarking subclonal variants in normal cell populations, we developed a benchmark containing mosaic variants in the Genome in a Bottle Consortium (GIAB) HG002 reference material DNA from a large batch of a normal lymphoblastoid cell line. First, we used a somatic variant caller with high coverage (300x) Illumina whole genome sequencing data from the Ashkenazi Jewish trio to detect variants in HG002 not detected in at least 5% of cells from the combined parental data.

View Article and Find Full Text PDF

The GIAB genomic stratifications resource for human reference genomes.

Nathan Dwarshuis , Divya Kalra , Jennifer McDaniel , Philippe Sanio , Pilar Alvarez Jerez , Nathan D Olson

Nat Commun

October 2024

Article Synopsis

* The authors introduce "stratifications," or specific BED files, that outline different genomic contexts for GRCh37/38 and the new T2T-CHM13 reference, which includes previously challenging regions to sequence.
* They also compare the performance of sequencing benchmarks across these references, showing how difficult regions in CHM13 impact the overall performance, and provide a snakemake pipeline for generating stratifications to aid in optimizing sequencing platforms.

View Article and Find Full Text PDF

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning.

Nathan Dwarshuis , Peter Tonner , Nathan D Olson , Fritz J Sedlazeck , Justin Wagner

Commun Biol

October 2024

Article Synopsis

Current genomic variant calling pipelines are not one-size-fits-all, requiring developers and researchers to make subjective tradeoffs based on their specific applications.
StratoMod is introduced as a machine-learning tool that predicts germline variant calling errors in a data-driven way, improving the accuracy of variant detection, especially in complex genomic regions.
It offers insights into the impact of different reference methods on recall rates and helps identify clinically relevant variants that might be overlooked by existing pipelines, facilitating better decision-making in pipeline design.

View Article and Find Full Text PDF

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.

Jonas A Gustafson , Sophia B Gibson , Nikhita Damaraju , Miranda P G Zalusky , Kendra Hoekzema , Nathan D Olson

Genome Res

November 2024

Article Synopsis

* The 1000 Genomes Project and Oxford Nanopore Technologies are working together to produce LRS data from at least 800 samples to enhance the identification of genetic variations and better understand human genetic diversity.
* Initial analysis of 100 samples shows high accuracy in detecting genetic variants, including structural variants that disrupt gene function, and provides valuable data for the clinical genetics community to advance research on pathogenic variations.

View Article and Find Full Text PDF

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Jennifer H McDaniel , Vaidehi Patel , Nathan D Olson , Hua-Jun He , Zhiyong He

bioRxiv

June 2025

Article Synopsis

The Genome in a Bottle Consortium (GIAB) is creating matched tumor-normal samples that are publicly consented for sharing genomic data and cell lines, focusing on pancreatic ductal adenocarcinoma (PDAC).
They provide a comprehensive genomic dataset from the first individual, combining high-depth DNA from tumor and normal cells using advanced whole genome sequencing technologies.
This open-access resource aims to help develop benchmarks for detecting genetic variants in cancer, fostering innovation in genome measurement and analysis tools.

View Article and Find Full Text PDF

Analysis and benchmarking of small and large genomic variants across tandem repeats.

Adam C English , Egor Dolzhenko , Helyaneh Ziaei Jam , Sean K McKenzie , Nathan D Olson

Nat Biotechnol

March 2025

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies.

View Article and Find Full Text PDF

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.

Jonas A Gustafson , Sophia B Gibson , Nikhita Damaraju , Miranda Pg Zalusky , Kendra Hoekzema , Nathan D Olson

medRxiv

March 2024

Article Synopsis

* The 1000 Genomes Project ONT Sequencing Consortium is working to generate LRS data from at least 800 samples to better understand human genetic variation and improve variant detection.
* Initial data from the first 100 samples show high accuracy in identifying structural variants and methylation signatures, creating a useful public resource for finding disease-related genetic changes.

View Article and Find Full Text PDF

Editorial: Methods in computational genomics.

Lei Chen , Nathan D Olson

Front Genet

January 2024

View Article and Find Full Text PDF

Benchmarking of small and large variants across tandem repeats.

Adam English , Egor Dolzhenko , Helyaneh Ziaei Jam , Sean Mckenzie , Nathan D Olson

bioRxiv

November 2023

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples.

View Article and Find Full Text PDF

The complete sequence of a human Y chromosome.

Arang Rhie , Sergey Nurk , Monika Cechova , Savannah J Hoyt , Dylan J Taylor , Nathan D Olson

Nature

September 2023

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.

View Article and Find Full Text PDF

Rapid production and free distribution of a synthetic RNA material to support SARS-CoV-2 molecular diagnostic testing.

Megan H Cleveland , Erica L Romsos , Carolyn R Steffen , Nathan D Olson , Stephanie L Servetas

Biologicals

May 2023

In response to the COVID-19 pandemic, the National Institute of Standards and Technology released a synthetic RNA material for SARS-CoV-2 in June 2020. The goal was to rapidly produce a material to support molecular diagnostic testing applications. This material, referred to as Research Grade Test Material 10169, was shipped free of charge to laboratories across the globe to provide a non-hazardous material for assay development and assay calibration.

View Article and Find Full Text PDF

A draft human pangenome reference.

Wen-Wei Liao , Mobin Asri , Jana Ebler , Daniel Doerr , Marina Haukness , Nathan D Olson

Nature

May 2023

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.

View Article and Find Full Text PDF

Variant calling and benchmarking in an era of complete human genome sequences.

Nathan D Olson , Justin Wagner , Nathan Dwarshuis , Karen H Miga , Fritz J Sedlazeck

Nat Rev Genet

July 2023

Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations.

View Article and Find Full Text PDF

Precision engineering of biological function with large-scale measurements and machine learning.

Drew S Tack , Peter D Tonner , Abe Pressman , Nathan D Olson , Sasha F Levy

PLoS One

March 2023

Article Synopsis

The text discusses the growing need for precise engineering of biological functions in synthetic biology, especially for programmed sensing that regulates gene expression based on stimuli.
It introduces two innovative methods, in silico selection and machine-learning-enabled forward engineering, that leverage a comprehensive dataset to develop genetic sensors with specifically defined dose-response characteristics.
The methods demonstrate the capability to fine-tune genetic sensors for various performance metrics, such as sensitivity and output, and to predictively engineer new sensor mutations beyond the existing dataset.

View Article and Find Full Text PDF

Benchmarking challenging small variants with linked and long reads.

Justin Wagner , Nathan D Olson , Lindsay Harris , Ziad Khan , Jesse Farek

Cell Genom

May 2022

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as .

View Article and Find Full Text PDF

Semi-automated assembly of high-quality diploid human reference genomes.

Erich D Jarvis , Giulio Formenti , Arang Rhie , Andrea Guarracino , Chentao Yang , Nathan D Olson

Nature

November 2022

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome.

View Article and Find Full Text PDF

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.

Nathan D Olson , Justin Wagner , Jennifer McDaniel , Sarah H Stephens , Samuel T Westreich

Cell Genom

May 2022

The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications.

View Article and Find Full Text PDF

A complete reference genome improves analysis of human genetic variation.

Sergey Aganezov , Stephanie M Yan , Daniela C Soto , Melanie Kirsche , Samantha Zarate , Nathan D Olson

Science

April 2022

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery.

View Article and Find Full Text PDF

The complete sequence of a human genome.

Sergey Nurk , Sergey Koren , Arang Rhie , Mikko Rautiainen , Andrey V Bzikadze , Nathan D Olson

Science

April 2022

Article Synopsis

The Telomere-to-Telomere Consortium has completed the human reference genome, addressing the previously unfinished heterochromatic regions and offering a sequence of 3.055 billion base pairs.
This new genome assembly, T2T-CHM13, includes gapless sequences for nearly all chromosomes, correcting errors found in earlier genome references.
The update introduces nearly 200 million new base pairs and includes important genomic features like centromeric satellite arrays and gene predictions, enabling more comprehensive genetic studies.

View Article and Find Full Text PDF

Curated variation benchmarks for challenging medically relevant autosomal genes.

Justin Wagner , Nathan D Olson , Lindsay Harris , Jennifer McDaniel , Haoyu Cheng

Nat Biotechnol

May 2022

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly.

View Article and Find Full Text PDF

The genotype-phenotype landscape of an allosteric protein.

Drew S Tack , Peter D Tonner , Abe Pressman , Nathan D Olson , Sasha F Levy

Mol Syst Biol

December 2021

View Article and Find Full Text PDF