Cascaded Cross-Modality Fusion Network for 3D Object Detection.

Sensors (Basel)

School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.

Published: December 2020


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807PMC
http://dx.doi.org/10.3390/s20247243DOI Listing

Publication Analysis

Top Keywords

cascaded cross-modality
8
cross-modality fusion
8
fusion network
8
object detection
8
bounding boxes
8
center iou
8
iou loss
8
cascaded
4
network object
4
detection focus
4

Similar Publications

In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation.

View Article and Find Full Text PDF

Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy.

View Article and Find Full Text PDF

This paper presents a cross-modality generative learning framework for transitive magnetic resonance imaging (MRI) from electrical impedance tomography (EIT). The proposed framework is aimed at converting low-resolution EIT images to high-resolution wrist MRI images using a cascaded cycle generative adversarial network (CycleGAN) model. This model comprises three main components: the collection of initial EIT from the medical device, the generation of a high-resolution transitive EIT image from the corresponding MRI image for domain adaptation, and the coalescence of two CycleGAN models for cross-modality generation.

View Article and Find Full Text PDF

In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization.

View Article and Find Full Text PDF

Cascaded Cross-Modality Fusion Network for 3D Object Detection.

Sensors (Basel)

December 2020

School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases.

View Article and Find Full Text PDF