98%
921
2 minutes
20
Applying deep learning to predict patient prognostic survival outcomes using histological whole-slide images (WSIs) and genomic data is challenging due to the morphological and transcriptomic heterogeneity present in the tumor microenvironment. Existing deep learning-enabled methods often exhibit learning biases, primarily because the genomic knowledge used to guide directional feature extraction from WSIs may be irrelevant or incomplete. This results in a suboptimal and sometimes myopic understanding of the overall pathological landscape, potentially overlooking crucial histological insights. To tackle these challenges, we propose the CounterFactual Bidirectional Co-Attention Transformer framework. By integrating a bidirectional co-attention layer, our framework fosters effective feature interactions between the genomic and histology modalities and ensures consistent identification of prognostic features from WSIs. Using counterfactual reasoning, our model utilizes causality to model unimodal and multimodal knowledge for cancer risk stratification. This approach directly addresses and reduces bias, enables the exploration of 'what-if' scenarios, and offers a deeper understanding of how different features influence survival outcomes. Our framework, validated across eight diverse cancer benchmark datasets from The Cancer Genome Atlas (TCGA), represents a major improvement over current histology-genomic model learning methods. It shows an average 2.5% improvement in c-index performance over 18 state-of-the-art models in predicting patient prognoses across eight cancer types.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/JBHI.2025.3548048 | DOI Listing |
IEEE J Biomed Health Inform
July 2025
Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, with coronary artery disease (CAD) being the most prevalent form. To improve screening efficiency, there is a critical need for accurate, non-invasive, and cost-effective CAD detection methods. This study presents Co-Attention Dual-Modal ViT (CAD-ViT), a novel classification framework based on the Vision Transformer that integrates both electrocardiogram (ECG) and phonocardiogram (PCG) signals.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
August 2025
Applying deep learning to predict patient prognostic survival outcomes using histological whole-slide images (WSIs) and genomic data is challenging due to the morphological and transcriptomic heterogeneity present in the tumor microenvironment. Existing deep learning-enabled methods often exhibit learning biases, primarily because the genomic knowledge used to guide directional feature extraction from WSIs may be irrelevant or incomplete. This results in a suboptimal and sometimes myopic understanding of the overall pathological landscape, potentially overlooking crucial histological insights.
View Article and Find Full Text PDFSheng Wu Yi Xue Gong Cheng Xue Za Zhi
June 2024
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, P. R. China.
Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of "visual attention" crucial, but the modeling of "question attention" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
June 2023
Multimodal magnetic resonance imaging (MRI) contains complementary information in anatomical and functional images that help the accurate diagnosis and treatment evaluation of lung cancers. However, effectively exploiting the complementary information in chest MRI images remains challenging due to the lack of rigorous registration. In this paper, a novel method is proposed that can effectively exploit the complementary information in weakly paired images for accurate tumor segmentation, namely coco-attention mechanism.
View Article and Find Full Text PDFSensors (Basel)
August 2020
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
At present, the state-of-the-art approaches of Visual Question Answering (VQA) mainly use the co-attention model to relate each visual object with text objects, which can achieve the coarse interactions between multimodalities. However, they ignore the dense self-attention within question modality. In order to solve this problem and improve the accuracy of VQA tasks, in the present paper, an effective Dense Co-Attention Networks (DCAN) is proposed.
View Article and Find Full Text PDF