A Cross-Modal Attention-Driven Multi-Sensor Fusion Method for Semantic Segmentation of Point Clouds.

Huisheng Shi , Xin Wang , Jianghong Zhao , Xinnan Hua

Sensors (Basel)

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102627, China.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

To bridge the modality gap between camera images and LiDAR point clouds in autonomous driving systems-a critical challenge exacerbated by current fusion methods' inability to effectively integrate cross-modal features-we propose the Cross-Modal Fusion (CMF) framework. This attention-driven architecture enables hierarchical multi-sensor data fusion, achieving state-of-the-art performance in semantic segmentation tasks.The CMF framework first projects point clouds onto the camera coordinates through the use of perspective projection to provide spatio-depth information for RGB images. Then, a two-stream feature extraction network is proposed to extract features from the two modalities separately, and multilevel fusion of the two modalities is realized by a residual fusion module (RCF) with cross-modal attention. Finally, we design a perceptual alignment loss that integrates cross-entropy with feature matching terms, effectively minimizing the semantic discrepancy between camera and LiDAR representations during fusion. The experimental results based on the SemanticKITTI and nuScenes benchmark datasets demonstrate that the CMF method achieves mean intersection over union (mIoU) scores of 64.2% and 79.3%, respectively, outperforming existing state-of-the-art methods in regard to accuracy and exhibiting enhanced robustness in regard to complex scenarios. The results of the ablation studies further validate that enhancing the feature interaction and fusion capabilities in semantic segmentation models through cross-modal attention and perceptually guided cross-entropy loss (Pgce) is effective in regard to improving segmentation accuracy and robustness.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12031233	PMC
http://dx.doi.org/10.3390/s25082474	DOI Listing

Publication Analysis

Top Keywords

semantic segmentation

point clouds

fusion

cmf framework

cross-modal attention

cross-modal

cross-modal attention-driven

attention-driven multi-sensor

multi-sensor fusion

fusion method

Similar Publications

Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations.

IEEE Trans Pattern Anal Mach Intell

September 2025

Cheng Lei , Jie Fan , Xinran Li , Tian-Zhu Xiang , Ao Li

Camouflaged Object Segmentation (COS) faces significant challenges due to the scarcity of annotated data, where meticulous pixel-level annotation is both labor-intensive and costly, primarily due to the intricate object-background boundaries. Addressing the core question, "Can COS be effectively achieved in a zero-shot manner without manual annotations for any camouflaged object?", we propose an affirmative solution. We analyze the learned attention patterns for camouflaged objects and introduce a robust zero-shot COS framework.

View Article and Find Full Text PDF

Similar Publications

Incremental Learning for Defect Segmentation With Efficient Transformer Semantic Complement.

IEEE Trans Neural Netw Learn Syst

September 2025

Xiqi Li , Zhifu Huang , Ge Ma , Yu Liu

In industrial scenarios, semantic segmentation of surface defects is vital for identifying, localizing, and delineating defects. However, new defect types constantly emerge with product iterations or process updates. Existing defect segmentation models lack incremental learning capabilities, and direct fine-tuning (FT) often leads to catastrophic forgetting.

View Article and Find Full Text PDF

Similar Publications

Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis.

J Korean Med Sci

September 2025

Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea.

Yosep Chong , Daseul Park , Youngbin Ahn , Yoonjin Kwak , Seyeon Park

Background: With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses.

View Article and Find Full Text PDF

Similar Publications

Cherry-Net: real-time segmentation algorithm of cherry maturity based on improved PIDNet.

Front Plant Sci

September 2025

College of Big Data, Yunnan Agricultural University, Kunming, China.

Jie Cui , Lilian Zhang , Lutao Gao , Chunhui Bai , Linnan Yang

Introduction: Accurate identification of cherry maturity and precise detection of harvestable cherry contours are essential for the development of cherry-picking robots. However, occlusion, lighting variation, and blurriness in natural orchard environments present significant challenges for real-time semantic segmentation.

Methods: To address these issues, we propose a machine vision approach based on the PIDNet real-time semantic segmentation framework.

View Article and Find Full Text PDF

Similar Publications

GESur_Net: attention-guided network for surgical instrument segmentation in gastrointestinal endoscopy.

Med Biol Eng Comput

September 2025

Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, 300072, China.

Yaru Ma , Yuying Liu , Xin Chen , Zhongqing Zheng , Yufeng Wang

Surgical instrument segmentation plays an important role in robotic autonomous surgical navigation systems as it can accurately locate surgical instruments and estimate their posture, which helps surgeons understand the position and orientation of the instruments. However, there are still some problems affecting segmentation accuracy, like insufficient attention to the edges and center of surgical instruments, insufficient usage of low-level feature details, etc. To address these issues, a lightweight network for surgical instrument segmentation in gastrointestinal (GI) endoscopy (GESur_Net) is proposed.

View Article and Find Full Text PDF

Similar Publications