Aligning Protein-Coding Nucleotide Sequences with MACSE.

Methods Mol Biol

Institut des Sciences de l'Evolution de Montpellier (ISEM), CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France.

Published: April 2021


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Most genomic and evolutionary comparative analyses rely on accurate multiple sequence alignments. With their underlying codon structure, protein-coding nucleotide sequences pose a specific challenge for multiple sequence alignment. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that provided the first automatic solution for aligning protein-coding gene datasets containing both functional and nonfunctional sequences (pseudogenes). Through its unique features, reliable codon alignments can be built in the presence of frameshifts and stop codons suitable for subsequent analysis of selection based on the ratio of nonsynonymous to synonymous substitutions. Here we offer a practical overview and guidelines on the use of MACSE v2. This major update of the initial algorithm now comes with a graphical interface providing user-friendly access to different subprograms to handle multiple alignments of protein-coding sequences. We also present new pipelines based on MACSE v2 subprograms to handle large datasets and distributed as Singularity containers. MACSE and associated pipelines are available at: https://bioweb.supagro.inra.fr/macse/ .

Download full-text PDF

Source
http://dx.doi.org/10.1007/978-1-0716-1036-7_4DOI Listing

Publication Analysis

Top Keywords

multiple sequence
12
aligning protein-coding
8
protein-coding nucleotide
8
nucleotide sequences
8
sequences macse
8
sequence alignment
8
subprograms handle
8
sequences
5
macse
5
multiple
5

Similar Publications

Cell type-specific regulatory programs that drive type 1 diabetes (T1D) in the pancreas are poorly understood. Here, we performed single-nucleus multiomics and spatial transcriptomics in up to 32 nondiabetic (ND), autoantibody-positive (AAB), and T1D pancreas donors. Genomic profiles from 853,005 cells mapped to 12 pancreatic cell types, including multiple exocrine subtypes.

View Article and Find Full Text PDF

Amplicon sequencing is a popular method for understanding the diversity of bacterial communities in samples containing multiple organisms as exemplified by 16S rRNA sequencing. Another application of amplicon sequencing includes multiplexing both primer sets and samples, allowing sequencing of multiple targets in multiple samples in the same sequencing run. Multiple tools exist to process the amplicon sequencing data produced via the short-read Illumina platform, but there are fewer options for long-read Oxford Nanopore Technologies (ONT) sequencing, or for processing data from environmental surveillance or other sources with many different organisms.

View Article and Find Full Text PDF

Accurately identifying associations between human genes (proteins) and clinical phenotypes is critical for advancing drug development and precision medicine. While the human phenotype ontology (HPO) standardizes clinical phenotypes, current computational approaches for predicting human protein-phenotype associations suffer from two limitations: (1) underutilization of multimodal protein-related information and (2) lack of state-of-the-art deep learning representations tailored to diverse data modalities, such as text and sequence. To overcome these limitations, we introduce MultiFusion2HPO, a novel multimodal model that integrates diverse features and advanced learning methods from multiple data sources to enhance the prediction of human protein-HPO associations.

View Article and Find Full Text PDF

Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a hierarchical plug-and-play pruning-and-recovering framework, called Hierarchical Hourglass Tokenizer (HOT), for efficient transformer-based 3D human pose estimation from videos.

View Article and Find Full Text PDF

Evolution of cross-tolerance to metals in yeast.

Proc Natl Acad Sci U S A

September 2025

Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

Organisms often face multiple selective pressures simultaneously (e.g., mine tailings with multiple heavy metal contaminants), yet we know little about when adaptation to one stressor provides cross-tolerance or cross-intolerance to other stressors.

View Article and Find Full Text PDF