Metadata practices for simulation workflows.

José Villamar , Matthias Kelbling , Heather L More , Michael Denker , Tom Tetzlaff , Johanna Senk , Stephan Thober

Sci Data

Department of Computational Hydrosystems, Helmholtz-Centre for Environmental Research, Leipzig, Germany.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141434	PMC
http://dx.doi.org/10.1038/s41597-025-05126-1	DOI Listing

Publication Analysis

Top Keywords

metadata

metadata practices

practices simulation

simulation workflows

workflows computer

computer simulations

simulations essential

essential pillar

pillar knowledge

knowledge generation

Similar Publications

OpenML: Insights from 10 years and more than a thousand papers.

Patterns (N Y)

July 2025

L3S Research Center, Leibniz University Hannover, Hannover, Germany.

Bernd Bischl , Giuseppe Casalicchio , Taniya Das , Matthias Feurer , Sebastian Fischer

OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science.

View Article and Find Full Text PDF

Similar Publications

Open-source models for development of data and metadata standards.

Patterns (N Y)

July 2025

University of Washington, Department of Astronomy, Seattle, WA, USA.

Ariel Rokem , Vani Mandava , Nicoleta Cristea , Anshul Tambay , Kristofer Bouchard

Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories.

View Article and Find Full Text PDF

Similar Publications

Shaping the future EHDS: recommendations for implementation of Health Data Access Bodies in the HealthData@EU infrastructure for secondary use of electronic health data.

Eur J Public Health

September 2025

Danish Health Data Authority, Copenhagen, Denmark.

Lise S Svingel , Caroline E Jensen , Gitte F Kjeldsen , Maria H Pedersen , Dipak Kalra

European Union (EU) Member States face challenges in using health data for secondary purposes, constrained by inconsistent digital health systems and limited cross-border sharing. One aim of the European Health Data Space (EHDS) is to facilitate secondary health data use through the HealthData@EU infrastructure and Health Data Access Bodies (HDABs). This article provides recommendations essential for HDAB implementation, informed by the HealthData@EU Pilot project.

View Article and Find Full Text PDF

Similar Publications

User journeys in cross-European secondary use of health data: insights ahead of the European Health Data Space.

Eur J Public Health

September 2025

Copenhagen Health Complexity Center, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.

Rachel B Forster , Eva Garcia Alvarez , Adrian G Zucco , Enrique Bernal-Delgado , Gayo Diallo

The European Health Data Space (EHDS) regulation aims to facilitate cross-border sharing of health data across Europe. However, practical challenges related to data access, interoperability, quality, and interpretive competence remain, particularly when working with health systems across countries. This study aimed to evaluate and report the user journey of researchers accessing and utilizing health data across four European countries for secondary research purposes prior to implementation of EHDS.

View Article and Find Full Text PDF

Similar Publications

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus data repository.

Genome Biol

September 2025

Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, 90089, USA.

Yu-Ning Huang , Pooja Vinod Jaiswal , Anushka Rajes , Anushka Yadav , Dottie Yu

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

Similar Publications