98%
921
2 minutes
20
We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807 | PMC |
http://dx.doi.org/10.3390/s20247243 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
May 2025
In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation.
View Article and Find Full Text PDFSingle-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy.
View Article and Find Full Text PDFComput Med Imaging Graph
September 2023
The Department of Diagnostic Radiology, The University of Hong Kong, Hong Kong. Electronic address:
This paper presents a cross-modality generative learning framework for transitive magnetic resonance imaging (MRI) from electrical impedance tomography (EIT). The proposed framework is aimed at converting low-resolution EIT images to high-resolution wrist MRI images using a cascaded cycle generative adversarial network (CycleGAN) model. This model comprises three main components: the collection of initial EIT from the medical device, the generation of a high-resolution transitive EIT image from the corresponding MRI image for domain adaptation, and the coalescence of two CycleGAN models for cross-modality generation.
View Article and Find Full Text PDFIEEE Trans Image Process
May 2023
In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization.
View Article and Find Full Text PDFSensors (Basel)
December 2020
School of Computer Science, Nanjing University of Posts and Telecommunications, No. 9 Wenyuan Road, Yadong New District, Nanjing 210023, China.
We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases.
View Article and Find Full Text PDF