98%
921
2 minutes
20
Objective: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.
Materials And Methods: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.
Results: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers.
Discussion And Conclusions: Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647331 | PMC |
http://dx.doi.org/10.1093/jamia/ocaa095 | DOI Listing |
NPJ Digit Med
March 2025
University of California Berkeley, School of Information Science, Berkeley, CA, USA.
Cyber-attacks on healthcare entities and leaks of personal identifiable information (PII) are a growing threat. However, it is now possible to learn sensitive characteristics of an individual without PII, by combining advances in artificial intelligence, analytics, and online repositories. We discuss privacy threats and privacy engineering solutions, emphasizing the selection of privacy enhancing technologies for various healthcare cases.
View Article and Find Full Text PDFData Brief
February 2025
Kennesaw State University, United States.
With hackers relentlessly disrupting cyberspace and the day-to-day operations of organizations worldwide, there are also concerns related to Personally Identifiable Information (PII). Due to the data breaches and the data getting dumped on the clear web or the dark web, there are serious concerns about how the different threat actors worldwide can misuse the data. Also, it raises the question of how hackers can create a profile of an individual starting from one data leak and getting more details on individuals with the help of Open Source Intelligence (OSINT).
View Article and Find Full Text PDFPatterns (N Y)
June 2021
nference, Cambridge, MA 02142, USA.
The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data.
View Article and Find Full Text PDFJ Am Med Inform Assoc
July 2020
Human Language Technology, MITRE Corporation, Bedford, Massachusetts, USA.
Objective: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.
View Article and Find Full Text PDFJ Am Med Inform Assoc
December 2019
The MITRE Corp, Bedford, Massachusetts, USA.
Objective: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.
View Article and Find Full Text PDF