Cascaded Cross-Modality Fusion Network for 3D Object Detection.

Zhiyu Chen , Qiong Lin , Jing Sun , Yujian Feng , Shangdong Liu , Qiang Liu , Yimu Ji , He Xu

Sensors (Basel)

School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.

Published: December 2020

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807	PMC
http://dx.doi.org/10.3390/s20247243	DOI Listing

Publication Analysis

Top Keywords

cascaded cross-modality

cross-modality fusion

fusion network

object detection

bounding boxes

center iou

iou loss

cascaded

network object

detection focus

Similar Publications

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.

IEEE Trans Neural Netw Learn Syst

May 2025

Yunzhi Zhuge , Hongyu Gu , Lu Zhang , Jinqing Qi , Huchuan Lu

In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation.

View Article and Find Full Text PDF

Similar Publications

DuDoCFNet: Dual-Domain Coarse-to-Fine Progressive Network for Simultaneous Denoising, Limited-View Reconstruction, and Attenuation Correction of Cardiac SPECT.

IEEE Trans Med Imaging

September 2024

Xiongchao Chen , Bo Zhou , Xueqi Guo , Huidong Xie , Qiong Liu

Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy.

View Article and Find Full Text PDF

Similar Publications

Cross modality generative learning framework for anatomical transitive Magnetic Resonance Imaging (MRI) from Electrical Impedance Tomography (EIT) image.

Comput Med Imaging Graph

September 2023

The Department of Diagnostic Radiology, The University of Hong Kong, Hong Kong. Electronic address:

Zuojun Wang , Mehmood Nawaz , Sheheryar Khan , Peng Xia , Muhammad Irfan

This paper presents a cross-modality generative learning framework for transitive magnetic resonance imaging (MRI) from electrical impedance tomography (EIT). The proposed framework is aimed at converting low-resolution EIT images to high-resolution wrist MRI images using a cascaded cycle generative adversarial network (CycleGAN) model. This model comprises three main components: the collection of initial EIT from the medical device, the generation of a high-resolution transitive EIT image from the corresponding MRI image for domain adaptation, and the coalescence of two CycleGAN models for cross-modality generation.

View Article and Find Full Text PDF

Similar Publications

WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection.

IEEE Trans Image Process

May 2023

Wujie Zhou , Fan Sun , Qiuping Jiang , Runmin Cong , Jenq-Neng Hwang

In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization.

View Article and Find Full Text PDF

Similar Publications

Cascaded Cross-Modality Fusion Network for 3D Object Detection.

Sensors (Basel)

December 2020

School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.

Zhiyu Chen , Qiong Lin , Jing Sun , Yujian Feng , Shangdong Liu

View Article and Find Full Text PDF

Similar Publications