A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.

BMC Bioinformatics

Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA.

Published: October 2012


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees.

Results: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods.

Conclusions: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543182PMC
http://dx.doi.org/10.1186/1471-2105-13-256DOI Listing

Publication Analysis

Top Keywords

large phylogenetic
12
phylogenetic trees
12
frequent agreement
12
sets large
8
find mfasts
8
trees method
8
input trees
8
agreement subtree
8
data sets
8
subtrees
5

Similar Publications

Casein kinase 1 (CK1) family members are crucial for ER-Golgi trafficking, calcium signalling, DNA repair, transfer RNA (tRNA) modifications, and circadian rhythmicity. Whether and how substrate interactions and kinase autophosphorylation contribute to CK1 plasticity remains largely unknown. Here, we undertake a comprehensive phylogenetic, cellular, and molecular characterization of budding yeast CK1 Hrr25 and identify human CK1 epsilon (CK1ϵ) as its ortholog.

View Article and Find Full Text PDF

Genome-wide identification and functional characterization of rapid alkalinization factor 6 as a key peptide regulator of abiotic stress tolerance in Tartary buckwheat.

Plant Sci

September 2025

Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Sichuan Engineering and Technology Research Center of Coarse Cereal Industrialization, Institute for Advanced Study, Chengdu University, Chengdu, Sichuan, China. Electronic address:

Rapid alkalinization factors (RALFs) are cysteine-rich signaling peptides in plants that play critical roles in development, immune regulation, and responses to abiotic stress. Despite their importance, the functional characterization of RALF family members in Tartary buckwheat (Fagopyrum tataricum), a nutrient-rich crop known for its remarkable resilience to multiple stresses, remains largely unexplored. In this study, we conducted a comprehensive genome-wide analysis to identify and characterize the FtRALF gene family in Tartary buckwheat, examining their phylogenetic relationships, gene structures, and duplication events.

View Article and Find Full Text PDF

Horizontally transferred NADAR genes contribute to immune defense of ladybird beetles against bacterial infection.

Insect Biochem Mol Biol

September 2025

State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China. Electronic address:

Horizontal gene transfer (HGT) is now widely recognized as an important mechanism contributing to host immunity and adaptation. Ladybird beetles, with their diverse diets and habitats, encounter a broad spectrum of microbial threats, making effective immune responses critical for their survival. However, the immune roles of HGT-acquired genes in ladybirds remain largely unexplored.

View Article and Find Full Text PDF

Kobuviruses (family Picornaviridae, genus Kobuvirus) are enteric viruses that infect a wide range of both human and animal hosts. Much of the evolutionary history of kobuviruses remains elusive, largely due to limited screening in wildlife. Bats have been implicated as major sources of virulent zoonoses, including coronaviruses, henipaviruses, lyssaviruses, and filoviruses, though much of the bat virome still remains uncharacterized.

View Article and Find Full Text PDF

Unlabelled: Bovine respiratory disease (BRD) is the primary disease of cattle and is responsible for most of the antibiotic use in the beef industry, both for metaphylaxis and treatment. Infection prevention and targeted treatments would benefit from detecting and identifying bacterial pathogens and, ideally, assessing antibiotic sensitivity. Here, we report success refining targeted metagenomics by hybridization capture sequencing (CapSeq) to detect and genotype bacterial pathogens and genes for antibiotic resistance in BRD.

View Article and Find Full Text PDF