98%
921
2 minutes
20
Convolutional neural networks (CNNs) for brain tumor segmentation are generally developed using complete sets of magnetic resonance imaging (MRI) sequences for both training and inference. As such, these algorithms are not trained for realistic, clinical scenarios where parts of the MRI sequences which were used for training, are missing during inference. To increase clinical applicability, we proposed a cross-modal distillation approach to leverage the availability of multi-sequence MRI data for training and generate an enriched CNN model which uses only single-sequence MRI data for inference but outperforms a single-sequence CNN model. We assessed the performance of the proposed method for whole tumor and tumor core segmentation with multi-sequence MRI data available for training but only T-weighted ([Formula: see text]) sequence data available for inference, using BraTS 2018, and in-house datasets. Results showed that cross-modal distillation significantly improved the Dice score for both whole tumor and tumor core segmentation when only [Formula: see text] sequence data were available for inference. For the evaluation using the in-house dataset, cross-modal distillation achieved an average Dice score of 79.04% and 69.39% for whole tumor and tumor core segmentation, respectively, while a single-sequence U-Net model using [Formula: see text] sequence data for both training and inference achieved an average Dice score of 73.60% and 62.62%, respectively. These findings confirmed cross-modal distillation as an effective method to increase the potential of single-sequence CNN models such that segmentation performance is less compromised by missing MRI sequences or having only one MRI sequence available for segmentation.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TBME.2021.3137561 | DOI Listing |
IEEE Trans Autom Sci Eng
March 2025
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Integrating multimodal neuroimaging datasets can enhance the early detection of AD. However, models must address the challenge of incomplete modalities, a common issue in real-world scenarios, as not all patients have access to all modalities due to practical constraints such as cost and availability.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
August 2025
Epilepsy is a prevalent neurological disorder marked by sudden, brief episodes of excessive neuronal activity caused by abnormal electrical discharges, which may lead to some mental disorders. Most existing deep learning methods for epilepsy detection rely solely on unimodal EEG signals, neglecting the potential benefits of multimodal information. To address this, we propose a novel multimodal model, DistilCLIP-EEG, based on the CLIP framework, which integrates both EEG signals and text descriptions to capture comprehensive features of epileptic seizures.
View Article and Find Full Text PDFSci Rep
August 2025
Department of Economics and Management, Suzhou Chien-Shiung Institute of Technology, Suzhou, 215411, China.
Emotion recognition in conversations (ERC), which involves identifying the emotional state of each utterance within a dialogue, plays a vital role in developing empathetic artificial intelligence systems. In practical applications, such as video-based recruitment interviews, customer service, health monitoring, intelligent personal assistants, and online education, ERC can facilitate the analysis of emotional cues, improve decision-making processes, and enhance user interaction and satisfaction. Current multimodal emotion recognition research faces several challenges, such as ineffective emotional information extraction from single modalities, underused complementary features, and inter-modal redundancy.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
August 2025
Multimodal emotion recognition has emerged as a promising direction for capturing the complexity of human affective states by integrating physiological and behavioral signals. However, challenges remain in addressing feature redundancy, modality heterogeneity, and insufficient inter-modal supervision. In this paper, we propose a novel Multimodal Disentangled Knowledge Distillation framework that explicitly disentangles modality-shared and modality-specific features and enhances cross-modal knowledge transfer via a graph-based distillation module.
View Article and Find Full Text PDFIEEE Trans Med Imaging
July 2025
Foundation models have significantly revolutionized the field of chest X-ray diagnosis with their ability to transfer across various diseases and tasks. However, previous works have predominantly utilized self-supervised learning from medical image-text pairs, which falls short in dense medical prediction tasks due to their sole reliance on such coarse pair supervision, thereby limiting their applicability to detailed diagnostics. In this paper, we introduce a Dense Chest X-ray Foundation Model (DCXFM), which utilizes mixed supervision types (i.
View Article and Find Full Text PDF