Improve your Galaxy text life: The Query Tabular Tool.

James E Johnson , Praveen Kumar , Caleb Easterly , Mark Esler , Subina Mehta , Arthur C Eschenlauer , Adrian D Hegeman , Pratik D Jagtap , Timothy J Griffin

F1000Res

Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA.

Published: October 2018

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different 'omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248266	PMC
http://dx.doi.org/10.12688/f1000research.16450.2	DOI Listing

Publication Analysis

Top Keywords

query tabular

text manipulation

tabular tool

sophisticated workflows

generated tabular

transformed customized

processing steps

tabular outputs

tabular

text

Similar Publications

Dotplotic: a lightweight visualization tool for BLAST + alignments and genomic annotations.

BMC Bioinformatics

August 2025

Department of Anatomy and Structural Biology, University of Yamanashi, Yamanashi, Japan.

Hideyuki Miyazawa , Toshiyuki Oda

With the development of sequencing technologies, chromosome-level genome assemblies have become increasingly common across various organisms, including non-model species. BLAST + is one of the most widely used bioinformatics tools for computing sequence alignments, offering numerous optimizations for speed and scalability. Dot plots, which visualize the similarity between two sequences, are widely used in biological research.

View Article and Find Full Text PDF

Similar Publications

CDE-Mapper: Using retrieval-augmented language models for linking clinical data elements to controlled vocabularies.

Comput Biol Med

September 2025

Institute of Data Science, Maastricht University, Maastricht, Netherlands. Electronic address:

Komal Gilani , Marlo Verket , Christof Peters , Michel Dumontier , Hans-Peter Brunner-La Rocca

The standardization of clinical data elements (CDEs) aims to ensure consistent and comprehensive patient information across various healthcare systems. Existing methods often falter when standardizing CDEs of varying representation and complex structure, impeding data integration and interoperability in clinical research. This paper presents CDE-Mapper, a framework that combines a retrieval-augmented generation strategy with large language models to automate the alignment of CDEs with controlled vocabularies.

View Article and Find Full Text PDF

Similar Publications

The development and use of data warehousing in clinical settings: a scoping review.

Front Digit Health

June 2025

Software Systems and Cybersecurity Department, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia.

Shiyang Lyu , Simon Craig , Gerard O'Reilly , David Taniar

Introduction: The emergence of data warehousing in clinical settings has greatly enhanced data analysis capabilities, facilitating the accurate and comprehensive extraction of valuable information. This scoping review explores the contributions of data warehouses in clinical settings by analysing the strengths, challenges and implications of each type of data warehouse, with a particular focus on general and specialised types.

Methods: This scoping review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

View Article and Find Full Text PDF

Similar Publications

SQL on FHIR - Tabular views of FHIR data using FHIRPath.

NPJ Digit Med

June 2025

Antidote Solutions, Lancaster, PA, USA.

John Grimes , Ryan Brush , Nikolai Rhyzhikov , Piotr Szul , Joshua Mandel

Challenges exist with the adoption of Fast Healthcare Interoperability Resources (FHIR) within analytics, including the difficulty in transforming complex data structures, and performance issues when querying large datasets in their native JSON or XML formats. In 2023, an international working group began work on a solution to this problem that would be easier to implement than existing approaches. Over the course of 18 months, the group authored a new specification and validated it through the development and testing of multiple independent implementations.

View Article and Find Full Text PDF

Similar Publications

ProTaxoVis-protein taxonomic visualisation of presence.

BMC Bioinformatics

May 2025

Department of Arctic and Marine Biology, Faculty of Biosciences, Fisheries and Economics, UiT Arctic University of Norway, 9037, Tromsø, Norway.

Yin-Chen Hsieh , Mathias Bockwoldt , Ines Heiland

Background: Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative pathways or a lack of these pathways altogether. The same inference applies to regulatory pathways such as gene regulation and signal transduction.

View Article and Find Full Text PDF

Similar Publications