Robust and accurate Bayesian inference of genome-wide genealogies for hundreds of genomes.

Yun Deng , Rasmus Nielsen , Yun S Song

Nat Genet

Department of Statistics, University of California, Berkeley, CA, USA.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The Ancestral Recombination Graph (ARG), which describes the genealogical history of a sample of genomes, is a vital tool in population genomics and biomedical research. Recent advancements have substantially increased ARG reconstruction scalability, but they rely on approximations that can reduce accuracy, especially under model misspecification. Moreover, they reconstruct only a single ARG topology and cannot quantify the considerable uncertainty associated with ARG inferences. Here, to address these challenges, we introduce SINGER (sampling and inferring of genealogies with recombination), a method that accelerates ARG sampling from the posterior distribution by two orders of magnitude, enabling accurate inference and uncertainty quantification for hundreds of whole-genome sequences. Through extensive simulations, we demonstrate SINGER's enhanced accuracy and robustness to model misspecification compared to existing methods. We demonstrate the utility of SINGER by applying it to individuals of British and African descent within the 1000 Genomes Project, identifying signals of population differentiation, archaic introgression and strong support for ancient polymorphism in the human leukocyte antigen region shared across primates.

Download full-text PDF	Source
http://dx.doi.org/10.1038/s41588-025-02317-9	DOI Listing

Publication Analysis

Top Keywords

model misspecification

arg

robust accurate

accurate bayesian

bayesian inference

inference genome-wide

genome-wide genealogies

genealogies hundreds

hundreds genomes

genomes ancestral

Similar Publications

Monocyte Anisocytosis Can Discriminate Between Sepsis and Sterile Inflammation, but not Mortality, in Critically Ill Surgical/Trauma Patients: A Secondary Prospective Analysis.

Crit Care Explor

September 2025

Department of Biostatistics, University of Florida Colleges of Medicine and Public Health and Health Professions, Gainesville, FL.

Miguel Hernández-Ríos , Ruoxuan Wu , Valerie A Polcz , Rachel D Burnside , Lael M Yonker

Objectives Background: Monocyte anisocytosis (monocyte distribution width [MDW]) has been previously validated to predict sepsis and outcome in patients presenting in the emergency department and mixed-population ICUs. Determining sepsis in a critically ill surgical/trauma population is often difficult due to concomitant inflammation and stress. We examined whether MDW could identify sepsis among patients admitted to a surgical/trauma ICU and predict clinical outcome.

View Article and Find Full Text PDF

Similar Publications

Robust and accurate Bayesian inference of genome-wide genealogies for hundreds of genomes.

Nat Genet

September 2025

Department of Statistics, University of California, Berkeley, CA, USA.

Yun Deng , Rasmus Nielsen , Yun S Song

View Article and Find Full Text PDF

Similar Publications

Integrative rank-based regression for multi-source high-dimensional data with multi-type responses.

J Appl Stat

January 2025

Department of Statistics and Data Science, School of Economics, Xiamen University, Xiamen, People's Republic of China.

Fuzhi Xu , Shuangge Ma , Qingzhao Zhang

Practical scenarios often present instances where the types of responses are different between multi-source different datasets, reflecting distinct attributes or characteristics. In this paper, an integrative rank-based regression is proposed to facilitate information sharing among varied datasets with multi-type responses. Taking advantage of the rank-based regression, our proposed approach adeptly tackles differences in the magnitude of loss functions.

View Article and Find Full Text PDF

Similar Publications

REFINE2: A simplified simulation tool to help epidemiologists evaluate the suitability and sensitivity of effect estimation within user-specified data.

Am J Epidemiol

September 2025

Department of Public Health Sciences, Thompson School of Social Work & Public Health, University of Hawaii at Manoa, Honolulu, Hawaii.

Xiang Meng , Jonathan Y Huang

Epidemiologists have access to various methods to reduce bias and improve statistical efficiency in effect estimation, from standard multivariable regression to state-of-the-art doubly-robust efficient estimators paired with highly flexible, data-adaptive algorithms ("machine learning"). However, due to numerous assumptions and trade-offs, epidemiologists face practical difficulties in recognizing which method, if any, may be suitable for their specific data and hypotheses. Importantly, relative advantages are necessarily context-specific (data structure, algorithms, model misspecification), limiting the utility of universal guidance.

View Article and Find Full Text PDF

Similar Publications

Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects.

J Am Stat Assoc

March 2025

Department of Biostatistics, Harvard University.

Larry Han , Jue Hou , Kelly Cho , Rui Duan , Tianxi Cai

Federated learning of causal estimands may greatly improve estimation efficiency by leveraging data from multiple study sites, but robustness to heterogeneity and model misspecifications is vital for ensuring validity. We develop a Federated Adaptive Causal Estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a flexibly specified target population of interest. FACE accounts for site-level heterogeneity in the distribution of covariates through density ratio weighting.

View Article and Find Full Text PDF

Similar Publications