Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle and ASMC-clust used samples ten or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11160635PMC
http://dx.doi.org/10.1101/2024.05.24.595829DOI Listing

Publication Analysis

Top Keywords

coalescent trees
12
relate tsinfer+tsdate
12
tsinfer+tsdate arg-needle
12
arg-needle asmc-clust
12
methods
9
population-mean polygenic
8
polygenic score
8
methods estimating
8
marginal coalescent
8
downstream applications
8

Similar Publications

Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent).

View Article and Find Full Text PDF

Understanding crop domestication offers crucial insights into the evolutionary processes that drive population divergence and adaptation. It also informs the identification of genetically diverse wild germplasm, which is essential for breeding and conservation efforts. While domestication has been extensively studied in many Mediterranean fruit trees, the evolutionary history of the almond () remains comparatively underexplored.

View Article and Find Full Text PDF

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow.

View Article and Find Full Text PDF

The Global LAteSt Split (GLASS)/Maximum Tree (MT) method of species tree inference, implemented in the software STEM, performs well and is statistically consistent when inferring a species tree from known gene trees. When input gene trees are estimated from DNA sequences, it can perform poorly, with possibly unrealistic sufficient conditions for statistical consistency. We propose a modification to STEM, called genX, which replaces its estimated pairwise coalescence times with randomly generated realizations from an estimated distribution of the true coalescence times.

View Article and Find Full Text PDF

A major goal of speciation research is identifying loci that underpin barriers to gene flow. Population genomics takes a 'bottom-up' approach, scanning the genome for molecular signatures of processes that drive or maintain divergence. However, interpreting the 'genomic landscape' of speciation is complicated, because genome scans conflate multiple processes, most of which are not informative about gene flow.

View Article and Find Full Text PDF