98%
921
2 minutes
20
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle and ASMC-clust used samples ten or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11160635 | PMC |
http://dx.doi.org/10.1101/2024.05.24.595829 | DOI Listing |
Syst Biol
September 2025
Department of Ecology, Evolution, and Environmental Biology, Columbia University, New York, NY 10027, USA.
Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent).
View Article and Find Full Text PDFEvol Appl
September 2025
INRAE, Biologie du Fruit et Pathologie, UMR 1332, PrADAm Université de Bordeaux Villenave d'Ornon France.
Understanding crop domestication offers crucial insights into the evolutionary processes that drive population divergence and adaptation. It also informs the identification of genetically diverse wild germplasm, which is essential for breeding and conservation efforts. While domestication has been extensively studied in many Mediterranean fruit trees, the evolutionary history of the almond () remains comparatively underexplored.
View Article and Find Full Text PDFMol Biol Evol
July 2025
Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.
The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow.
View Article and Find Full Text PDFIEEE Trans Comput Biol Bioinform
January 2025
The Global LAteSt Split (GLASS)/Maximum Tree (MT) method of species tree inference, implemented in the software STEM, performs well and is statistically consistent when inferring a species tree from known gene trees. When input gene trees are estimated from DNA sequences, it can perform poorly, with possibly unrealistic sufficient conditions for statistical consistency. We propose a modification to STEM, called genX, which replaces its estimated pairwise coalescence times with randomly generated realizations from an estimated distribution of the true coalescence times.
View Article and Find Full Text PDFMol Ecol
August 2025
Institute of Science and Technology Austria, Klosterneuburg, Austria.
A major goal of speciation research is identifying loci that underpin barriers to gene flow. Population genomics takes a 'bottom-up' approach, scanning the genome for molecular signatures of processes that drive or maintain divergence. However, interpreting the 'genomic landscape' of speciation is complicated, because genome scans conflate multiple processes, most of which are not informative about gene flow.
View Article and Find Full Text PDF