Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories.

Dandan Peng , Obadiah J Mulder , Michael D Edge

bioRxiv

Department of Quantitative and Computational Biology, University of Southern California.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle and ASMC-clust used samples ten or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11160635	PMC
http://dx.doi.org/10.1101/2024.05.24.595829	DOI Listing

Publication Analysis

Top Keywords

coalescent trees

relate tsinfer+tsdate

tsinfer+tsdate arg-needle

arg-needle asmc-clust

methods

population-mean polygenic

polygenic score

methods estimating

marginal coalescent

downstream applications

Similar Publications

Estimating waiting distances between genealogy changes under a Multi-Species Extension of the Sequentially Markov Coalescent.

Syst Biol

September 2025

Department of Ecology, Evolution, and Environmental Biology, Columbia University, New York, NY 10027, USA.

Patrick F McKenzie , Deren A R Eaton

Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent).

View Article and Find Full Text PDF

Similar Publications

Insights Into the Almond Domestication History.

Evol Appl

September 2025

INRAE, Biologie du Fruit et Pathologie, UMR 1332, PrADAm Université de Bordeaux Villenave d'Ornon France.

Stephane Decroocq , Amandine Cornille , Naïma Dlalah , Henri Duval , David Tricon

Understanding crop domestication offers crucial insights into the evolutionary processes that drive population divergence and adaptation. It also informs the identification of genetically diverse wild germplasm, which is essential for breeding and conservation efforts. While domestication has been extensively studied in many Mediterranean fruit trees, the evolutionary history of the almond () remains comparatively underexplored.

View Article and Find Full Text PDF

Similar Publications

The Impact of Sequencing and Genotyping Errors on Bayesian Analysis of Genomic Data under the Multispecies Coalescent Model.

Mol Biol Evol

July 2025

Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.

Jiayi Ji , Paschalia Kapli , Tomáš Flouri , Ziheng Yang

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow.

View Article and Find Full Text PDF

Similar Publications

Improvement to GLASS/Maximum Tree Method of Species Tree Inference Using Measurement Error Modified Single Linkage Clustering.

IEEE Trans Comput Biol Bioinform

January 2025

Sarah K Alver , Fletcher G W Christensen , James H Degnan

The Global LAteSt Split (GLASS)/Maximum Tree (MT) method of species tree inference, implemented in the software STEM, performs well and is statistically consistent when inferring a species tree from known gene trees. When input gene trees are estimated from DNA sequences, it can perform poorly, with possibly unrealistic sufficient conditions for statistical consistency. We propose a modification to STEM, called genX, which replaces its estimated pairwise coalescence times with randomly generated realizations from an estimated distribution of the true coalescence times.

View Article and Find Full Text PDF

Similar Publications

Genealogical Analysis of Replicate Flower Colour Hybrid Zones in Antirrhinum.

Mol Ecol

August 2025

Institute of Science and Technology Austria, Klosterneuburg, Austria.

Arka Pal , Daria Shipilina , Alan Le Moan , Adrian J McNairn , Jennifer K Grenier

A major goal of speciation research is identifying loci that underpin barriers to gene flow. Population genomics takes a 'bottom-up' approach, scanning the genome for molecular signatures of processes that drive or maintain divergence. However, interpreting the 'genomic landscape' of speciation is complicated, because genome scans conflate multiple processes, most of which are not informative about gene flow.

View Article and Find Full Text PDF

Similar Publications