Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objective: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.

Materials And Methods: We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender's process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker's success using measures of leak-detection rate and accuracy.

Results: The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained undetected.

Discussion And Conclusion: A malicious parrot attack to reveal leaked PII in clinical text deidentified by machine-learned HIPS resynthesis can attenuate but not eliminate the protective effect of HIPS deidentification.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857511PMC
http://dx.doi.org/10.1093/jamia/ocz114DOI Listing

Publication Analysis

Top Keywords

leaked pii
20
parrot attack
12
hips resynthesis
12
pii
10
clinical text
8
text deidentified
8
hiding plain
8
plain sight
8
malicious attacker
8
expose leaked
8

Similar Publications

Cyber-attacks on healthcare entities and leaks of personal identifiable information (PII) are a growing threat. However, it is now possible to learn sensitive characteristics of an individual without PII, by combining advances in artificial intelligence, analytics, and online repositories. We discuss privacy threats and privacy engineering solutions, emphasizing the selection of privacy enhancing technologies for various healthcare cases.

View Article and Find Full Text PDF

With hackers relentlessly disrupting cyberspace and the day-to-day operations of organizations worldwide, there are also concerns related to Personally Identifiable Information (PII). Due to the data breaches and the data getting dumped on the clear web or the dark web, there are serious concerns about how the different threat actors worldwide can misuse the data. Also, it raises the question of how hackers can create a profile of an individual starting from one data leak and getting more details on individuals with the help of Open Source Intelligence (OSINT).

View Article and Find Full Text PDF

The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data.

View Article and Find Full Text PDF

Objective: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.

View Article and Find Full Text PDF

Objective: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.

View Article and Find Full Text PDF