Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging.

Nishanth Arun , Nathan Gaw , Praveer Singh , Ken Chang , Mehak Aggarwal , Bryan Chen , Katharina Hoebel , Sharut Gupta , Jay Patel , Mishka Gidwani , Julius Adebayo , Matthew D Li , Jayashree Kalpathy-Cramer

Radiol Artif Intell

Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater

Published: November 2021

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: To evaluate the trustworthiness of saliency maps for abnormality localization in medical imaging.

Materials And Methods: Using two large publicly available radiology datasets (Society for Imaging Informatics in Medicine-American College of Radiology Pneumothorax Segmentation dataset and Radiological Society of North America Pneumonia Detection Challenge dataset), the performance of eight commonly used saliency map techniques were quantified in regard to localization utility (segmentation and detection), sensitivity to model weight randomization, repeatability, and reproducibility. Their performances versus baseline methods and localization network architectures were compared, using area under the precision-recall curve (AUPRC) and structural similarity index measure (SSIM) as metrics.

Results: All eight saliency map techniques failed at least one of the criteria and were inferior in performance compared with localization networks. For pneumothorax segmentation, the AUPRC ranged from 0.024 to 0.224, while a U-Net achieved a significantly superior AUPRC of 0.404 ( < .005). For pneumonia detection, the AUPRC ranged from 0.160 to 0.519, while a RetinaNet achieved a significantly superior AUPRC of 0.596 ( <.005). Five and two saliency methods (of eight) failed the model randomization test on the segmentation and detection datasets, respectively, suggesting that these methods are not sensitive to changes in model parameters. The repeatability and reproducibility of the majority of the saliency methods were worse than localization networks for both the segmentation and detection datasets.

Conclusion: The use of saliency maps in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network. Technology Assessment, Technical Aspects, Feature Detection, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2021.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637231	PMC
http://dx.doi.org/10.1148/ryai.2021200267	DOI Listing

Publication Analysis

Top Keywords

trustworthiness saliency

saliency maps

pneumothorax segmentation

pneumonia detection

saliency map

map techniques

auprc ranged

achieved superior

superior auprc

auprc

Similar Publications

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks.

Front Artif Intell

August 2025

Aviation Industry Development Research Center of China, Beijing, China.

Shiyong Chu , Yuwei Chen

Autonomous systems operating in high-dimensional environments increasingly rely on prioritization heuristics to allocate attention and assess risk, yet these mechanisms can introduce cognitive biases such as salience, spatial framing, and temporal familiarity that influence decision-making without altering the input or accessing internal states. This study presents Priority Inversion via Operational Reasoning (PRIOR), a black-box, non-perturbative diagnostic framework that employs structurally biased but semantically neutral scenario cues to probe inference-level vulnerabilities without modifying pixel-level, statistical, or surface semantic properties. Given the limited accessibility of embodied vision-based systems, we evaluate PRIOR using large language models (LLMs) as abstract reasoning proxies to simulate cognitive prioritization in constrained textual surveillance scenarios inspired by Unmanned Aerial Vehicle (UAV) operations.

View Article and Find Full Text PDF

Similar Publications

Towards inclusive explainable artificial intelligence: a thematic analysis and scoping review on tools for persons with disabilities.

Disabil Rehabil Assist Technol

May 2025

Faculty of Business and Information Technology, Ontario Tech University, Oshawa, Ontario, Canada.

Zahra Atf , Peter R Lewis

Objective: Explainable Artificial Intelligence (XAI) offers transparent, trustworthy decision support, yet its implementation in disability contexts remains limited. This scoping review aims to map and evaluate XAI tools developed for individuals with disabilities and identify thematic patterns to inform the design of inclusive rehabilitation technologies.

Methods: A systematic search of literature from January 2018 to June 2024 was conducted across SCOPUS, ACM Digital Library, IEEE Xplore, ProQuest and Google Scholar, guided by Arksey & O'Malley's framework and PRISMA-ScR guidelines.

View Article and Find Full Text PDF

Similar Publications

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning.

ArXiv

May 2025

Center for Computational Medicine & Clinical AI, Section of Biomedical Data Science, Department of Medicine, University of Chicago, IL, USA.

Sahil Sethi , David Chen , Thomas Statchen , Michael C Burkhart , Nipun Bhandari

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments-enabling faithful, case-based explanations.

View Article and Find Full Text PDF

Similar Publications

Are You Safe or Should I Go? How Perceived Trustworthiness and Probability of a Sexual Transmittable Infection Impact Activation of the Salience Network.

eNeuro

February 2025

Department of Psychology, University of Konstanz, Konstanz 78457, Germany.

Alexander Wolber , Stephanie N L Schmidt , Brigitte Rockstroh , Daniela Mier

Functional imaging studies indicate that both the assessment of a person as untrustworthy and the assumption that a person has a sexually transmitted infection are associated with activation in regions of the salience network. However, studies are missing that combine these aspects and investigate the perceived trustworthiness of individuals previously assessed with high or low probability of a sexually transmitted infection. During fMRI measurements, 25 participants viewed photographs of people preclassified as having high or low HIV probability and judged their trustworthiness.

View Article and Find Full Text PDF

Similar Publications

Robust explainer recommendation for time series classification.

Data Min Knowl Discov

June 2024

School of Computer Science, University College Dublin, Dublin, Ireland.

Thu Trang Nguyen , Thach Le Nguyen , Georgiana Ifrim

Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as explanation is key to understand the data and the model better. Recently, a great variety of techniques (e.

View Article and Find Full Text PDF

Similar Publications