98%
921
2 minutes
20
Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141434 | PMC |
http://dx.doi.org/10.1038/s41597-025-05126-1 | DOI Listing |
Patterns (N Y)
July 2025
L3S Research Center, Leibniz University Hannover, Hannover, Germany.
OpenML is an open-source platform that democratizes machine-learning evaluation by enabling anyone to share datasets in uniform standards, define precise machine-learning tasks, and automatically share detailed workflows and model evaluations. More than just a platform, OpenML fosters a collaborative ecosystem where scientists create new tools, launch initiatives, and establish standards to advance machine learning. Over the past decade, OpenML has inspired over 1,500 publications across diverse fields, from scientists releasing new datasets and benchmarking new models to educators teaching reproducible science.
View Article and Find Full Text PDFPatterns (N Y)
July 2025
University of Washington, Department of Astronomy, Seattle, WA, USA.
Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories.
View Article and Find Full Text PDFEur J Public Health
September 2025
Danish Health Data Authority, Copenhagen, Denmark.
European Union (EU) Member States face challenges in using health data for secondary purposes, constrained by inconsistent digital health systems and limited cross-border sharing. One aim of the European Health Data Space (EHDS) is to facilitate secondary health data use through the HealthData@EU infrastructure and Health Data Access Bodies (HDABs). This article provides recommendations essential for HDAB implementation, informed by the HealthData@EU Pilot project.
View Article and Find Full Text PDFEur J Public Health
September 2025
Copenhagen Health Complexity Center, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
The European Health Data Space (EHDS) regulation aims to facilitate cross-border sharing of health data across Europe. However, practical challenges related to data access, interoperability, quality, and interpretive competence remain, particularly when working with health systems across countries. This study aimed to evaluate and report the user journey of researchers accessing and utilizing health data across four European countries for secondary research purposes prior to implementation of EHDS.
View Article and Find Full Text PDFGenome Biol
September 2025
Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, 90089, USA.
Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.
View Article and Find Full Text PDF