Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.

Shuaihui Wang , Fengyi Jiang , Boqian Xu

Sensors (Basel)

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China.

Published: October 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650861	PMC
http://dx.doi.org/10.3390/s23218802	DOI Listing

Publication Analysis

Top Keywords

swin transformer-based

transformer-based edge

edge guidance

guidance network

salient object

object detection

rgb-d sod

cnn-based network

backbone extract

extract features

Similar Publications

A Multi-Task Deep Learning Pipeline Integrating Vessel Segmentation and Radiomics for Multiclass Retinal Disease Classification.

Photodiagnosis Photodyn Ther

September 2025

Department of Ophthalmology, People's Hospital of Feng Jie, Chongqing, 404600, China. Electronic address:

Feng Yan , Yanxia Liu , Qingsong Zhao , Guangguo He

Objective: This study aims to develop a robust, multi-task deep learning framework that integrates vessel segmentation and radiomic analysis for the automated classification of four retinal conditions- diabetic retinopathy (DR), hypertensive retinopathy (HR), papilledema, and normal fundus-using fundus images.

Materials: AND.

Methods: A total of 2,165 patients from eight medical centers were enrolled.

View Article and Find Full Text PDF

Similar Publications

Dual-model approach for accurate chest disease detection using GViT and swin transformer V2.

Sci Rep

August 2025

Centre for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, United Arab Emirates.

Kamal Ahmad , Hafeez Ur Rehman , Babar Shah , Farman Ali , Irfan Hussain

The precise detection and localization of abnormalities in radiological images are very crucial for clinical diagnosis and treatment planning. To build reliable models, large and annotated datasets are required that contain disease labels and abnormality locations. Most of the time, radiologists face challenges in identifying and segmenting thoracic diseases such as COVID-19, Pneumonia, Tuberculosis, and lung cancer due to overlapping visual patterns in X-ray images.

View Article and Find Full Text PDF

Similar Publications

MV2SwimNet: A lightweight transformer-based hybrid model for knee meniscus tears detection.

PLoS One

August 2025

Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha Area, Kuwait.

Vishesh Tanwar , Bhisham Sharma , Dhirendra Prasad Yadav , Julian L Webber , Abolfazl Mehbodniya

Knee Ailments, such as meniscus injuries, bother millions globally, with research showing that more than 14% of the population above 40 years lives with meniscus-related conditions. Conventional diagnosis techniques, like manual MRI interpretation, are labour-intensive, error-prone, and dependent on skilled radiologists, making an automatic and more accurate alternative indispensable. Current deep-learning solutions heavily depend on CNNs, which perform poorly in long-range dependencies and global contextual info.

View Article and Find Full Text PDF

Similar Publications

Quantum integration in swin transformer mitigates overfitting in breast cancer screening.

Sci Rep

August 2025

Origin Quantum Computing Technology (Hefei) Co., Ltd., Hefei, 230088, China.

Zongyu Xie , Xiaoguang Yang , Shuni Zhang , Jingru Yang , Yun Zhu

To explore the potential of quantum computing in advancing transformer-based deep learning models for breast cancer screening, this study introduces the Quantum-Enhanced Swin Transformer (QEST). This model integrates a Variational Quantum Circuit (VQC) to replace the fully connected layer responsible for classification in the Swin Transformer architecture. In simulations, QEST exhibited competitive accuracy and generalization performance compared to the original Swin Transformer, while also demonstrating an effect in mitigating overfitting.

View Article and Find Full Text PDF

Similar Publications

Transformer-based arterial spin labeling perfusion MRI denoising.

Vis Comput

July 2025

Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine.

Muhammad Nadeem Cheema , Lei Zhang , Anam Nazir , Yiran Li , John A Detre

Arterial Spin Labeling (ASL) perfusion MRI is the only non-invasive technique for quantifying regional cerebral blood flow (CBF) visualization, which is an important physiological variable. ASL MRI has a relatively low signal-to-noise-ratio (SNR), making it challenging to achieve high quality CBF images using limited data. Promising ASL CBF denoising results have been shown in recent convolutional neural network (CNN)-based methods.

View Article and Find Full Text PDF

Similar Publications