Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation.

MinDong Sung , Dongchul Cha , Yu Rang Park

JMIR Med Inform

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.

Published: November 2021

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing.

Objective: Using machine learning techniques, we applied differential privacy to medical data with diverse parameters and checked the feasibility of our algorithms with synthetic data as well as the balance between data privacy and utility.

Methods: All data were normalized to a range between -1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the cardinality of the categorical variables, we performed postprocessing via discretization. The algorithm was evaluated using both synthetic and real-world data (from the eICU Collaborative Research Database). We evaluated the difference between the original data and the perturbated data using misclassification rates and the mean squared error for categorical data and continuous data, respectively. Further, we compared the performance of classification models that predict in-hospital mortality using real-world data.

Results: The misclassification rate of categorical variables ranged between 0.49 and 0.85 when the value of ε was 0.1, and it converged to 0 as ε increased. When ε was between 10 and 10, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of the continuous variables decreased as ε increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as ε increased. In particular, the accuracy of a random forest model developed from the original data was 0.801, and this value ranged from 0.757 to 0.81 when ε was 10 and 10, respectively.

Conclusions: We applied local differential privacy to medical domain data, which are diverse and high dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663640	PMC
http://dx.doi.org/10.2196/26914	DOI Listing

Publication Analysis

Top Keywords

differential privacy

data

privacy medical

original data

model developed

privacy

local differential

medical domain

medical data

data diverse

Similar Publications

Evaluating anti-LGBTQIA+ medical bias in large language models.

PLOS Digit Health

September 2025

Department of Dermatology, Stanford University, Stanford, California, United States of America.

Crystal T Chang , Neha Srivathsa , Charbel Bou-Khalil , Akshay Swaminathan , Mitchell R Lunn

Large Language Models (LLMs) are increasingly deployed in clinical settings for tasks ranging from patient communication to decision support. While these models demonstrate race-based and binary gender biases, anti-LGBTQIA+ bias remains understudied despite documented healthcare disparities affecting these populations. In this work, we evaluated the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation.

View Article and Find Full Text PDF

Similar Publications

Challenges of identification and anonymity in time-continuous data from medical environments.

Front Digit Health

August 2025

KASTEL Security Research Labs, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany.

Freimut Hammer , Thorsten Strufe

In medical environments, time-continuous data, such as electrocardiographic records, necessitates a distinct approach to anonymization due to the paramount importance of preserving its spatio-temporal integrity for optimal utility. A wide array of data types, characterized by their high sensitivity to the patient's well-being and their substantial interest to researchers, are generated. A significant proportion of this data may be of interest to researchers beyond the original purposes for which it was collected.

View Article and Find Full Text PDF

Similar Publications

Resident Preferences for Telemedicine Services in China in the Digital Health Era: Mixed Methods Study.

J Med Internet Res

September 2025

Fujian Psychiatric Center, Fujian Clinical Research Center for Mental Disorders, Xianyue Hospital Affiliated to Xiamen Medical College, Xiamen, China.

Maomin Jiang , Jian Zhao , Ranran Zhao , Jialiang Feng , Manli Gu

Background: In the digital health era, telemedicine has become a key driver of health care reform and innovation globally. Understanding the factors influencing residents' choices of telemedicine services is crucial for optimizing service design, enhancing user experience, and developing effective policy measures.

Objective: This study aims to explore the key factors influencing Chinese residents' choices of telemedicine services, including consultation fee, physician qualifications, appointment waiting time, scope of services, privacy protection, and service hours.

View Article and Find Full Text PDF

Similar Publications

DgeaHeatmap: an R package for transcriptomic analysis and heatmap generation.

Bioinform Adv

August 2025

Department of Anatomy and Cell Biology, Medical School OWL, Bielefeld University, Bielefeld 33615, Germany.

Leonie J Lancelle , Phani S Potru , Björn Spittau , Susanne Wiemann

Motivation: The growing use of transcriptomic data from platforms like Nanostring GeoMx DSP demands accessible and flexible tools for differential gene expression analysis and heatmap generation. Current web-based tools often lack transparency, modifiability, and independence from external servers creating barriers for researchers seeking customizable workflows, as well as data privacy and security. Additionally, tools that can be utilized by individuals with minimal bioinformatics expertise provide an inclusive solution, empowering a broader range of users to analyze complex data effectively.

View Article and Find Full Text PDF

Similar Publications

Privacy-preserving federated transfer learning for enhanced liver lesion segmentation in PET-CT imaging.

Artif Intell Med

November 2025

Department of Nuclear Medicine, Huzhou Central Hospital, Fifth School of Clinical Medicine of Zhejiang Chinese Medical University, Huzhou, 313001, China. Electronic address:

Rajesh Kumar , Shaoning Zeng , Jay Kumar , Zakria , Xinfeng Mao

Positron Emission Tomography-Computed Tomography (PET-CT) evolution is critical for liver lesion diagnosis. However, data scarcity, privacy concerns, and cross-institutional imaging heterogeneity impede accurate deep learning model deployment. We propose a Federated Transfer Learning (FTL) framework that integrates federated learning's privacy-preserving collaboration with transfer learning's pre-trained model adaptation, enhancing liver lesion segmentation in PET-CT imaging.

View Article and Find Full Text PDF

Similar Publications