A Vision of How Low-Coverage Sequence Data Should Contribute to Genetic Evaluation in the Future.

R Mark Thallman , J E Borgert , Bailey N Engle , John W Keele , Warren M Snelling , Cedric Gondro , Larry A Kuehn

J Anim Sci

USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933 USA.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Low-coverage sequencing refers to sequencing DNA of individuals to a low depth of coverage (e.g., 0.5X) and imputing that sequence to genomic sequence based on reference haplotypes from individuals sequenced to high depth of coverage (e.g., ≥ 10X). It has been proposed as an alternative to genotyping by SNP arrays. At least one commercial product based on it is available for agricultural species. Concerns limiting adoption in its current form are: 1) the cost of storing the huge volume of data it generates and 2) whether that additional data will result in improved accuracy of genetic evaluation. This work envisions future implementation of low-coverage sequencing to reduce storage costs and enhance genetic evaluations by leveraging the additional information in the full sequence of the pangenome to account for more genetic variation. We propose addressing the storage issue by representing genomic sequence of an individual in a pair of haplotype arrays with each element pointing to an enumerated haplotype of the sequence within one of approximately 50,000 defined genome segments. Assuming 60 million genomic variants, the infrastructure required to translate the identifier of any enumerated haplotype into its genomic sequence would require less than 10 gigabytes of binary storage. Each haplotype array element would require 2 bytes, so the marginal binary storage required to represent the genomic sequence of an individual would be about 200 kilobytes (KB), similar to the genotypes from a SNP array with 200,000 markers. This assumes no pedigree and no ambiguity of the imputation, though the latter is unrealistic. Strategies to minimize, and when necessary, to manage and efficiently represent ambiguity are proposed. The genomic sequence of an individual could be stored in about 1 KB (binary) if both parents have unambiguous sequence stored as described above. The proposed system for representing the pangenome includes algorithms for read mapping and imputation intended to leverage all known genetic variation in the target population. It is also designed to use sequencing reads generated for imputing genomic sequence of new individuals to identify unrecognized mutations, crossovers, and structural variants, thus continuously improving the genome representation, especially if widespread use of low-coverage sequencing in livestock industries is realized. This could make improved genetic merit and management of livestock feasible without computational burden.

Download full-text PDF	Source
http://dx.doi.org/10.1093/jas/skaf294	DOI Listing

Publication Analysis

Top Keywords

genomic sequence

low-coverage sequencing

sequence individual

sequence

genetic evaluation

depth coverage

genetic variation

enumerated haplotype

binary storage

genomic

Similar Publications

Distinct codon usage signatures reflecting evolutionary and pathogenic adaptation in the Acinetobacter baumannii complex.

Eur J Clin Microbiol Infect Dis

September 2025

School of Bioengineering and Biosciences, Department of Biochemistry, Lovely Professional University, Punjab, 144411, India.

Ujwal Dahal , Anuj Sharma , Karan Paul , Anu Bansal , Shelly Gupta

Purpose: This study investigates codon usage and amino acid usage bias in the genus Acinetobacter to uncover the evolutionary forces shaping these patterns and their implications for pathogenicity and biotechnology.

Methods: Codon usage patterns were examined in representative genomes of the genus Acinetobacter using standard codon bias indices, including GC content, relative synonymous codon usage (RSCU), effective number of codons (ENC), and codon adaptation index (CAI). Neutrality and parity plots were employed to evaluate the relative influence of mutational pressure and natural selection on codon preferences.

View Article and Find Full Text PDF

Similar Publications

The plastid genome of the critically endangered Valeriana trinervis (= Centranthus trinervis) and insights from comparison with other Valeriana plastomes (Caprifoliaceae).

Planta

September 2025

Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126, Naples, Italy.

Daniele De Luca , Olga De Castro

The first complete plastid genome of the critically endangered species Valeriana trinervis was sequenced, assembled and compared with other published Valeriana plastomes. In this study, we assembled the plastid genome of the critically endangered, endemic species Valeriana trinervis (= Centranthus trinervis) and compare it with all published plastomes of Valeriana. We found not only differences in the inverted repeats boundaries, in the type and abundance of repeats, but also similarities in codon usage and microsatellite numbers.

View Article and Find Full Text PDF

Similar Publications

Integrative analysis identifies FERMT3 as a key regulator of metabolic reprogramming in keloid scarring and metabolic syndrome.

Funct Integr Genomics

September 2025

Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, 350005, China.

Qian Lin , Beichen Cai , Feng Dong , Ruonan Ke , Xiuying Shan

Keloid scarring and Metabolic Syndrome (MS) are distinct conditions marked by chronic inflammation and tissue dysregulation, suggesting shared pathogenic mechanisms. Identifying common regulatory genes could unveil novel therapeutic targets. Methods.

View Article and Find Full Text PDF

Similar Publications

Construction of a Core Germplasm and Identification of Candidate SNPs Associated with Growth Performance of Epinephelus tukula by Whole-Genome Resequencing.

Mar Biotechnol (NY)

September 2025

Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, Sanya, China.

Liu Cao , Jun Ma , Ningyiming Hong , Jiaoli Yao , Yan Lu

Epinephelus tukula is an economically important aquaculture animal, and a major parent in grouper crossbreeding. To better preserve and exploit E. tukula germplasm resources, a core collection (containing 34 individuals derived from 10 genetic groups) was first constructed based on phenotypic growth traits and whole-genome resequencing (WGS) data.

View Article and Find Full Text PDF

Similar Publications

Key copper homeostasis genes and inflammatory mechanisms in ischemic stroke: A bioinformatics and experimental study.

Funct Integr Genomics

September 2025

The First Clinical Medical College, Yunnan University of Chinese Medicine, Kunming, China.

Ting Shi , Zhifeng Wang , Jiao Yang , Pengfen He , Daman Tian

Ischemic stroke (IS) has high morbidity/mortality with limited treatments. This study screened core copper homeostasis-related genes in IS and validated their function as precise intervention targets. Human IS gene chip data were retrieved from GEO, and copper homeostasis genes from multiple databases.

View Article and Find Full Text PDF

Similar Publications