A highly optimized grid deployment: the metagenomic analysis example.

Stud Health Technol Inform

Universidad Politécnica de Valencia - ITACA.

Published: October 2008


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Computational resources and computationally expensive processes are two topics that are not growing at the same ratio. The availability of large amounts of computing resources in Grid infrastructures does not mean that efficiency is not an important issue. It is necessary to analyze the whole process to improve partitioning and submission schemas, especially in the most critical experiments. This is the case of metagenomic analysis, and this text shows the work done in order to optimize a Grid deployment, which has led to a reduction of the response time and the failure rates. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. One important stage in Metagenomic analysis consists on the extraction of fragments followed by the comparison and analysis of their function stage. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally intensive and requires of several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the "Non-redundant" database which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. The article describes and analyzes the difficulties to fragment, automate and check the above operations in current Grid production environments. This environment has been tuned-up from an experimental study which has tested the most efficient and reliable resources, the optimal job size, and the data transference and database reindexation overhead. The environment should re-submit faulty jobs, detect endless tasks and ensure that the results are correctly retrieved and workflow synchronised. The paper will give an outline on the structure of the system, and the preparation steps performed to deal with this experiment.

Download full-text PDF

Source

Publication Analysis

Top Keywords

metagenomic analysis
12
grid deployment
8
analysis
6
highly optimized
4
grid
4
optimized grid
4
metagenomic
4
deployment metagenomic
4
analysis example
4
example computational
4

Similar Publications

Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data.

View Article and Find Full Text PDF

The metalloid tellurium (Te) is toxic to bacteria; however, the element is also extremely rare. Thus, most bacteria will never encounter Te in their environment. Nonetheless significant research has been performed on bacterial Te resistance because of the medical applications of the element.

View Article and Find Full Text PDF

Background: Increasing evidence suggests a potential role of the gut microbiota in Parkinson's disease (PD). However, the relationship between the gut microbiome (GM) and PD dementia (PDD) remains debated, with their causal effects and underlying mechanisms not yet fully understood.

Methods: Utilizing data from large-scale genome-wide association studies (GWASs), this study applied bidirectional and mediating Mendelian randomization (MR) to investigate the causal relationship and underlying mechanisms between the GM and PDD.

View Article and Find Full Text PDF

Background And Aim: Silage plays a pivotal role in ruminant nutrition, significantly influencing rumen fermentation, animal productivity, and environmental sustainability. Despite extensive research on silage and fermentation, a comprehensive synthesis of global trends and collaborations in this domain has not been systematically explored. This study aimed to conduct a bibliometric analysis of global research on silage feed and its effects on rumen fermentation in ruminants.

View Article and Find Full Text PDF

Arbuscular mycorrhizal fungi enhance nitrate ammonification in hyphosphere soil.

New Phytol

September 2025

State Key Laboratory of Nutrient Use and Management, College of Resources and Environmental Sciences, National Academy of Agriculture Green Development, China Agricultural University, Beijing, 100193, China.

Microbial nitrate ammonification is a crucial process to retain nitrogen (N) in soils, thereby reducing N loss. Nitrate ammonification has been studied in enrichment and axenic bacterial cultures but so far has been merely ignored in environmental studies. In particular, the capability of arbuscular mycorrhizal fungi (AMF) to regulate nitrate ammonification has not yet been explored.

View Article and Find Full Text PDF