Publications by Kim D Pruitt

Publications by authors named "Kim D Pruitt"

Page 1 of 4

GenBank 2025 update.

Eric W Sayers , Mark Cavanaugh , Linda Frisse , Kim D Pruitt , Valerie A Schneider

Nucleic Acids Res

January 2025

GenBank® (https://www.ncbi.nlm.

View Article and Find Full Text PDF

The international nucleotide sequence database collaboration (INSDC): enhancing global participation.

Ilene Karsch-Mizrachi , Masanori Arita , Tony Burdett , Guy Cochrane , Yasukazu Nakamura , Kim D Pruitt

Nucleic Acids Res

January 2025

The members of the International Nucleotide Sequence Database Collaboration (INSDC; https://insdc.org) have built systems to collect, archive and disseminate sequence data for more than four decades. The three collaborating organizations, the National Library of Medicine, National Center for Biotechnology Information (NLM-NCBI) in the United States, Research Organization of Information and Systems, National Institute of Genetics (ROIS-NIG) in Japan; and the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) formalized their relationship through the adoption of an arrangement which documents their commitment to free and open access to genomic sequences.

View Article and Find Full Text PDF

NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.

Tamara Goldfarb , Vamsi K Kodali , Shashikant Pujar , Vyacheslav Brover , Barbara Robbertse , Kim D Pruitt

Nucleic Acids Res

January 2025

Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information in 2025.

Eric W Sayers , Jeffrey Beck , Evan E Bolton , J Rodney Brister , Jessica Chan , Kim D Pruitt

Nucleic Acids Res

January 2025

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 31 distinct repositories and knowledgebases. The E-utilities serve as the programming interface for most of these.

View Article and Find Full Text PDF

Biomedical Data Repository Concepts and Management Principles.

Dawei Lin , Matthew McAuliffe , Kim D Pruitt , Anupama Gururaj , Christine Melchior

Sci Data

June 2024

Article Synopsis

The demand for open data and open science is growing due to the need for transparency and reproducibility in research, supported by policies from organizations like the U.S. National Institutes of Health.
The paper highlights the crucial role of data repositories in managing, preserving, and sharing biomedical research data.
It aims to educate readers about the functions and evaluation of data repositories, helping researchers and policymakers choose effective options for data management and fostering open data practices.

View Article and Find Full Text PDF

The NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignments.

Sanjida H Rangwala , Dmitry V Rudnev , Victor V Ananiev , Dong-Ha Oh , Andrea Asztalos , Kim D Pruitt

PLoS Biol

May 2024

We report a new visualization tool for analysis of whole-genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.

View Article and Find Full Text PDF

Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows.

Ryan Connor , Migun Shakya , David A Yarmosh , Wolfgang Maier , Ross Martin , Kim D Pruitt

Viruses

March 2024

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants.

View Article and Find Full Text PDF

Rapid and sensitive detection of genome contamination at scale with FCS-GX.

Alexander Astashyn , Eric S Tvedte , Deacon Sweeney , Victor Sapojnikov , Nathan Bouk , Kim D Pruitt

Genome Biol

February 2024

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.

View Article and Find Full Text PDF

Interactive visualization of whole eukaryote genome alignments using NCBI's Comparative Genome Viewer (CGV).

Sanjida H Rangwala , Dmitry V Rudnev , Victor V Ananiev , Andrea Asztalos , Barrett Benica , Kim D Pruitt

bioRxiv

November 2023

We report a new visualization tool for analysis of whole genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information.

Eric W Sayers , Jeff Beck , Evan E Bolton , J Rodney Brister , Jessica Chan , Kim D Pruitt

Nucleic Acids Res

January 2024

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.

View Article and Find Full Text PDF

GenBank 2024 Update.

Eric W Sayers , Mark Cavanaugh , Karen Clark , Kim D Pruitt , Stephen T Sherry

Nucleic Acids Res

January 2024

GenBank® (https://www.ncbi.nlm.

View Article and Find Full Text PDF

The status of the human gene catalogue.

Paulo Amaral , Silvia Carbonell-Sala , Francisco M De La Vega , Tiago Faial , Adam Frankish , Kim D Pruitt

Nature

October 2023

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years.

View Article and Find Full Text PDF

Rapid and sensitive detection of genome contamination at scale with FCS-GX.

Alexander Astashyn , Eric S Tvedte , Deacon Sweeney , Victor Sapojnikov , Nathan Bouk , Kim D Pruitt

bioRxiv

June 2023

Article Synopsis

FCS-GX is a new tool developed by NCBI to quickly identify and remove contamination from genomic sequences.
It efficiently screens genomes in a short time (0.1-10 minutes) and has high sensitivity (>95%) and specificity (>99.93%) for detecting various contaminant species.
The tool was used to analyze 1.6 million GenBank assemblies, uncovering 36.8 Gbp of contamination, which led to improved genome accuracy in NCBI's databases.

View Article and Find Full Text PDF

The status of the human gene catalogue.

Paulo Amaral , Silvia Carbonell-Sala , Francisco M De La Vega , Tiago Faial , Adam Frankish , Kim D Pruitt

ArXiv

March 2023

Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function.

View Article and Find Full Text PDF

Towards increased accuracy and reproducibility in SARS-CoV-2 next generation sequence analysis for public health surveillance.

Ryan Connor , David A Yarmosh , Wolfgang Maier , Migun Shakya , Ross Martin , Kim D Pruitt

bioRxiv

November 2022

During the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information in 2023.

Eric W Sayers , Evan E Bolton , J Rodney Brister , Kathi Canese , Jessica Chan , Kim D Pruitt

Nucleic Acids Res

January 2023

View Article and Find Full Text PDF

GenBank 2023 update.

Eric W Sayers , Mark Cavanaugh , Karen Clark , Kim D Pruitt , Stephen T Sherry

Nucleic Acids Res

January 2023

GenBank® (https://www.ncbi.nlm.

View Article and Find Full Text PDF

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Joannella Morales , Shashikant Pujar , Jane E Loveland , Alex Astashyn , Ruth Bennett , Kim D Pruitt

Nature

April 2022

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE and RefSeq launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins.

View Article and Find Full Text PDF

Standards recommendations for the Earth BioGenome Project.

Mara K N Lawniczak , Richard Durbin , Paul Flicek , Kerstin Lindblad-Toh , Xiaofeng Wei , Kim D Pruitt

Proc Natl Acad Sci U S A

January 2022

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals.

View Article and Find Full Text PDF

RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse.

Catherine M Farrell , Tamara Goldfarb , Sanjida H Rangwala , Alexander Astashyn , Olga D Ermolaeva , Kim D Pruitt

Genome Res

January 2022

Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature.

View Article and Find Full Text PDF

GenBank.

Eric W Sayers , Mark Cavanaugh , Karen Clark , Kim D Pruitt , Conrad L Schoch

Nucleic Acids Res

January 2022

GenBank® (https://www.ncbi.nlm.

View Article and Find Full Text PDF

Database resources of the national center for biotechnology information.

Eric W Sayers , Evan E Bolton , J Rodney Brister , Kathi Canese , Jessica Chan , Kim D Pruitt

Nucleic Acids Res

January 2022

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases.

View Article and Find Full Text PDF

GenBank.

Eric W Sayers , Mark Cavanaugh , Karen Clark , Kim D Pruitt , Conrad L Schoch

Nucleic Acids Res

January 2021

GenBank® (https://www.ncbi.nlm.

View Article and Find Full Text PDF

Database resources of the National Center for Biotechnology Information.

Eric W Sayers , Jeffrey Beck , Evan E Bolton , Devon Bourexis , James R Brister , Kim D Pruitt

Nucleic Acids Res

January 2021

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system.

View Article and Find Full Text PDF

GenBank.

Eric W Sayers , Mark Cavanaugh , Karen Clark , James Ostell , Kim D Pruitt

Nucleic Acids Res

January 2020

GenBank® (www.ncbi.nlm.

View Article and Find Full Text PDF