Inference of a species network from genomic data remains a difficult problem, with recent progress mostly limited to the level-1 case. However, inference of the Tree of Blobs of a network, showing only the network's cut edges, can be performed for any network by TINNiK, suggesting a divide-and-conquer approach to network inference where the tree's multifurcations are individually resolved to give more detailed structure. Here we develop a method, , to quickly perform such a level-1 resolution.
View Article and Find Full Text PDFTransfer RNAs (tRNAs) are among the most highly conserved and frequently transcribed genes. Recent studies have identified transcription-associated mutagenesis (TAM) as a significant contributor to sequence variation around tRNA loci. However, the extent to which TAM drives allelic variation in tRNAs remains unclear, largely due to the confounding effects of strong selection pressures to maintain their structural integrity.
View Article and Find Full Text PDFThe eukaryote Tree of Life (eToL) depicts the relationships among all eukaryotic organisms; its root represents the Last Eukaryotic Common Ancestor (LECA) from which all extant complex lifeforms are descended. Locating this root is crucial for reconstructing the features of LECA, both as the endpoint of eukaryogenesis and the start point for the evolution of the myriad complex traits underpinning the diversification of living eukaryotes. However, the position of the root remains contentious due to pervasive phylogenetic artefacts stemming from inadequate evolutionary models, poor taxon sampling and limited phylogenetic signal.
View Article and Find Full Text PDFASTRAL is a powerful and widely used tool for species tree inference, known for its computational speed and robustness under incomplete lineage sorting. The method has often been used as an initial step in species network inference to provide a backbone tree structure upon which hybridization events are later added to such a tree via other methods. However, we show empirically and theoretically, that this methodology can yield flawed results.
View Article and Find Full Text PDFAlgorithms Mol Biol
November 2024
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference.
View Article and Find Full Text PDFMol Biol Evol
September 2024
When hybridization or other forms of lateral gene transfer have occurred, evolutionary relationships of species are better represented by phylogenetic networks than by trees. While inference of such networks remains challenging, several recently proposed methods are based on quartet concordance factors-the probabilities that a tree relating a gene sampled from the species displays the possible 4-taxon relationships. Building on earlier results, we investigate what level-1 network features are identifiable from concordance factors under the network multispecies coalescent model.
View Article and Find Full Text PDFThe tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference.
View Article and Find Full Text PDFJ Math Biol
February 2024
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations.
View Article and Find Full Text PDFWhen hybridization or other forms of lateral gene transfer have occurred, evolutionary relationships of species are better represented by phylogenetic networks than by trees. While inference of such networks remains challenging, several recently proposed methods are based on quartet concordance factors - the probabilities that a tree relating a gene sampled from the species displays the possible 4-taxon relationships. Building on earlier results, we investigate what level-1 network features are identifiable from concordance factors under the network multispecies coalescent model.
View Article and Find Full Text PDFBiochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies.
View Article and Find Full Text PDFReticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations.
View Article and Find Full Text PDFJ Math Biol
December 2022
Inference of species networks from genomic data under the Network Multispecies Coalescent Model is currently severely limited by heavy computational demands. It also remains unclear how complicated networks can be for consistent inference to be possible. As a step toward inferring a general species network, this work considers its tree of blobs, in which non-cut edges are contracted to nodes, so only tree-like relationships between the taxa are shown.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
April 2023
As genomic-scale datasets motivate research on species tree inference, simulators of the multispecies coalescent (MSC) process have become essential for the testing and evaluation of new inference methods. However, the simulators themselves must be tested to ensure that they give valid samples. This work develops methods for checking whether a collection of gene trees is in accord with the MSC model on a given species tree.
View Article and Find Full Text PDFInference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case.
View Article and Find Full Text PDFSummary: MSCquartets is an R package for species tree hypothesis testing, inference of species trees and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points.
View Article and Find Full Text PDFAlgorithms Mol Biol
December 2019
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network under this model, from data consisting of gene tree topologies, and provide the theoretical justification for it.
View Article and Find Full Text PDFMixtures of group-based Markov models of evolution correspond to joins of toric varieties. In this paper, we establish a large number of cases for which these phylogenetic join varieties realize their expected dimension, meaning that they are nondefective. Nondefectiveness is not only interesting from a geometric point-of-view, but has been used to establish combinatorial identifiability for several classes of phylogenetic mixture models.
View Article and Find Full Text PDFBull Math Biol
February 2019
We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané.
View Article and Find Full Text PDF