Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Importance: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.

Objective: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.

Design, Setting, And Participants: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.

Main Outcomes And Measures: Data set experts' perceptions on what makes data sets AI ready.

Results: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.

Conclusions And Relevance: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863PMC
http://dx.doi.org/10.1001/jamanetworkopen.2023.45892DOI Listing

Publication Analysis

Top Keywords

data set
40
data sets
24
data
22
health data
20
set experts
16
qualitative study
12
data quality
12
set
10
perceptions data
8
machine learning
8

Similar Publications

Predicting complex time series with deep echo state networks.

Chaos

September 2025

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

Although many real-world time series are complex, developing methods that can learn from their behavior effectively enough to enable reliable forecasting remains challenging. Recently, several machine-learning approaches have shown promise in addressing this problem. In particular, the echo state network (ESN) architecture, a type of recurrent neural network where neurons are randomly connected and only the read-out layer is trained, has been proposed as suitable for many-step-ahead forecasting tasks.

View Article and Find Full Text PDF

Farm Injury Deaths and Workers' Compensation Claims in Australia and Their Economic Costs.

Aust J Rural Health

October 2025

AgHealth Australia, School of Rural Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.

Objective: To describe the pattern and estimated direct economic burdens associated with unintentional deaths and injuries on Australian farms over the past 11 years (2013-2023).

Design: Descriptive retrospective epidemiological study of National Coronial Information System (NCIS) data for persons fatally injured on a farm and workers' compensation injuries data from the National Data Set.

Setting: Australia.

View Article and Find Full Text PDF

Introduction: We compared and measured alignment between the Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard used by electronic health records (EHRs), the Clinical Data Interchange Standards Consortium (CDISC) standards used by industry, and the Uniform Data Set (UDS) used by the Alzheimer's Disease Research Centers (ADRCs).

Methods: The ADRC UDS, consisting of 5959 data elements across eleven packets, was mapped to FHIR and CDISC standards by two independent mappers, with discrepancies adjudicated by experts.

Results: Forty-five percent of the 5959 UDS data elements mapped to the FHIR standard, indicating possible electronic obtainment from EHRs.

View Article and Find Full Text PDF

Most of the United States (US) population resides in cities, where they are subjected to the urban heat island effect. In this study, we develop a method to estimate hourly air temperatures at resolution, improving exposure assessment of US population when compared to existing gridded products. We use an extensive network of personal weather stations to capture the intra-urban variability.

View Article and Find Full Text PDF

The transition from traditional animal-based approaches and assessments to New Approach Methodologies (NAMs) marks a scientific revolution in regulatory toxicology, with the potential of enhancing human and environmental protection. However, implementing the effective use of NAMs in regulatory toxicology has proven to be challenging, and so far, efforts to facilitate this change frequently focus on singular technical, psychological or economic inhibitors. This article takes a system-thinking approach to these challenges, a holistic framework for describing interactive relationships between the components of a system of interest.

View Article and Find Full Text PDF