Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the "Hook Database" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses "on the fly" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.

Download full-text PDF

Source
http://dx.doi.org/10.1128/mbio.01770-25DOI Listing

Publication Analysis

Top Keywords

gene families
20
eukphylo v10
12
data curation
8
diversity microbial
8
phylogenomic analyses
8
data sets
8
tree life
8
eukphylo
7
analyses
7
gene
7

Similar Publications

Background: Most RNA-seq datasets harbor genes with extreme expression levels in some samples. Such extreme outliers are usually treated as technical errors and are removed from the data before further statistical analysis. Here we focus on the patterns of such outlier gene expression to investigate whether they provide insights into the underlying biology.

View Article and Find Full Text PDF

Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals.

Genome Biol

September 2025

Center for Genomic Medicine, Cardiovascular Research Center, , Massachusetts General Hospital Simches Research Center, 185 Cambridge Street, CPZN 5.238,, Boston, MA, 02114, USA.

Background: Rare genetic variation provided by whole genome sequence datasets has been relatively less explored for its contributions to human traits. Meta-analysis of sequencing data offers advantages by integrating larger sample sizes from diverse cohorts, thereby increasing the likelihood of discovering novel insights into complex traits. Furthermore, emerging methods in genome-wide rare variant association testing further improve power and interpretability.

View Article and Find Full Text PDF

X-Linked Hypophosphatemia: Role of Fibroblast Growth Factor 23 on Human Skeletal Muscle-Derived Cells.

Calcif Tissue Int

September 2025

FirmoLab, Fondazione F.I.R.M.O. Onlus and Stabilimento Chimico Farmaceutico Militare (SCFM), 50141, Florence, Italy.

X-linked hypophosphatemia (XLH) is a rare and progressive disease, due to inactivating mutations in the phosphate-regulating endopeptidase homolog X-linked (PHEX) gene. These pathogenic variants result in elevated circulating levels of fibroblast growth factor 23 (FGF23), responsible for the main clinical manifestations of XLH, such as hypophosphatemia, skeletal deformities, and mineralization defects. However, XLH also involves muscular disorders (muscle weakness, pain, reduced muscle density, peak strength, and power).

View Article and Find Full Text PDF

Personalised genomic strategies improve diagnostic yield in inherited retinal dystrophies: a stepwise, patient-centred approach.

Eye (Lond)

September 2025

Genetics Laboratory, Metropolitan South Clinical Laboratory, Bellvitge University Hospital, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain.

Background: Inherited retinal dystrophies (IRDs) are a genetically heterogeneous group of conditions, with approximately 40% of cases remaining unresolved after initial genetic testing. This study aimed to assess the impact of a personalised genomic approach integrating whole-exome sequencing (WES) reanalysis, whole-genome sequencing (WGS), customised gene panels and functional assays to improve diagnostic yield in unresolved cases.

Subjects/methods: We retrospectively reviewed a cohort of 597 individuals with IRDs, including 525 probands and 72 affected relatives.

View Article and Find Full Text PDF