Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141434PMC
http://dx.doi.org/10.1038/s41597-025-05126-1DOI Listing

Publication Analysis

Top Keywords

metadata
6
metadata practices
4
practices simulation
4
simulation workflows
4
workflows computer
4
computer simulations
4
simulations essential
4
essential pillar
4
pillar knowledge
4
knowledge generation
4

Similar Publications

OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science.

View Article and Find Full Text PDF

Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories.

View Article and Find Full Text PDF

European Union (EU) Member States face challenges in using health data for secondary purposes, constrained by inconsistent digital health systems and limited cross-border sharing. One aim of the European Health Data Space (EHDS) is to facilitate secondary health data use through the HealthData@EU infrastructure and Health Data Access Bodies (HDABs). This article provides recommendations essential for HDAB implementation, informed by the HealthData@EU Pilot project.

View Article and Find Full Text PDF

The European Health Data Space (EHDS) regulation aims to facilitate cross-border sharing of health data across Europe. However, practical challenges related to data access, interoperability, quality, and interpretive competence remain, particularly when working with health systems across countries. This study aimed to evaluate and report the user journey of researchers accessing and utilizing health data across four European countries for secondary research purposes prior to implementation of EHDS.

View Article and Find Full Text PDF

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF