98%
921
2 minutes
20
Objectives: The success of artificial intelligence (AI) and machine learning (ML) approaches in biomedical research depends on the quality of the underlying data. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Centric Challenge was designed to address the challenge of making raw clinical research data AI ready, with a focus on type 1 diabetes studies available in the NIDDK Central Repository (NIDDK-CR). This paper aims to present a structured methodology for enhancing the AI readiness of clinical datasets.
Materials And Methods: We detail a systematic approach for data aggregation and preprocessing, including binning continuous data, processing text features, managing missing values, and encoding for categorical variables while maintaining the data integrity and compatibility with ML algorithms.
Results: We applied the proposed methodology to transform raw clinical data from type 1 diabetes studies in the NIDDK-CR into a structured, AI-ready dataset. The evaluation process validated the effectiveness of our AI-readiness enhancement steps and explored the potential use cases in type 1 diabetes research.
Discussion: The methodology discussed in this paper will serve as guidance for preparing data for AI-driven clinical research, with the resulting AI-ready data to serve as a training tool for building and improving AI/ML model performance.
Conclusion: We present a generalizable framework for preparing clinical research data for AI applications. The resulting datasets lay a strong foundation for downstream AI/ML applications, setting the stage for a new era of data-driven discoveries.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/jamia/ocaf114 | DOI Listing |
Reprod Biol
September 2025
Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, No 218 Jixi Road, Hefei Anhui230022, China; Key Laboratory of Population Health Across
Current research indicates that polyethylene terephthalate microplastics (PET-MPs) may significantly impair male reproductive function. This study aimed to investigate the potential molecular mechanisms underlying this impairment. Potential gene targets of PET-MPs were predicted via the SwissTargetPrediction database.
View Article and Find Full Text PDFEur J Radiol
September 2025
Department of Radiology, Affiliated Hospital of Hebei University, Baoding 071000, China. Electronic address:
Purpose: The present study aimed to develop a noninvasive predictive framework that integrates clinical data, conventional radiomics, habitat imaging, and deep learning for the preoperative stratification of MGMT gene promoter methylation in glioma.
Materials And Methods: This retrospective study included 410 patients from the University of California, San Francisco, USA, and 102 patients from our hospital. Seven models were constructed using preoperative contrast-enhanced T1-weighted MRI with gadobenate dimeglumine as the contrast agent.
JACC Heart Fail
September 2025
Université de Lorraine, Inserm, Centre d'Investigations Cliniques Plurithématique 1433, Centre Hospitalier Régional Universitaire de Nancy, Nancy, France.
Pathol Res Pract
September 2025
Department of Pathology, Xijing Hospital and School of Basic Medicine, Fourth Military Medical University, Xi'an, China. Electronic address:
Background: Dermal clear cell sarcoma (DCCS) is a rare malignant mesenchymal neoplasm. Owing to the overlaps in its morphological and immunophenotypic profiles with a broad spectrum of tumors exhibiting melanocytic differentiation, it is frequently misdiagnosed as other tumor entities in clinical practice. By systematically analyzing the clinicopathological characteristics, immunophenotypic features, and molecular biological properties of DCCS, this study intends to further enhance pathologists' understanding of this disease and provide a valuable reference for its accurate diagnosis.
View Article and Find Full Text PDFInt J Epidemiol
August 2025
Department of Biostatistics and Informatics, University of Colorado, Aurora, CO, United States.
Background: Existing longitudinal cohort study data and associated biospecimen libraries provide abundant opportunities to efficiently examine new hypotheses through retrospective specimen testing. Outcome-dependent sampling (ODS) methods offer a powerful alternative to random sampling when testing all available specimens is not feasible or biospecimen preservation is desired. For repeated binary outcomes, a common ODS approach is to extend the case-control framework to the longitudinal setting.
View Article and Find Full Text PDF