DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization.

Sensors (Basel)

Robotics and Autonomous Driving Laboratory, Baidu Research, Beijing 100085, China.

Published: April 2022


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Recently, generating dense maps in real-time has become a hot research topic in the mobile robotics community, since dense maps can provide more informative and continuous features compared with sparse maps. Implicit depth representation (e.g., the depth code) derived from deep neural networks has been employed in the visual-only or visual-inertial simultaneous localization and mapping (SLAM) systems, which achieve promising performances on both camera motion and local dense geometry estimations from monocular images. However, the existing visual-inertial SLAM systems combined with depth codes are either built on a filter-based SLAM framework, which can only update poses and maps in a relatively small local time window, or based on a loosely-coupled framework, while the prior geometric constraints from the depth estimation network have not been employed for boosting the state estimation. To well address these drawbacks, we propose DiT-SLAM, a novel real-time ense visual-inertial SLAM with mplicit depth representation and ightly-coupled graph optimization. Most importantly, the poses, sparse maps, and low-dimensional depth codes are optimized with the tightly-coupled graph by considering the visual, inertial, and depth residuals simultaneously. Meanwhile, we propose a light-weight monocular depth estimation and completion network, which is combined with attention mechanisms and the conditional variational auto-encoder (CVAE) to predict the uncertainty-aware dense depth maps from more low-dimensional codes. Furthermore, a robust point sampling strategy introducing the spatial distribution of 2D feature points is also proposed to provide geometric constraints in the tightly-coupled optimization, especially for textureless or featureless cases in indoor environments. We evaluate our system on open benchmarks. The proposed methods achieve better performances on both the dense depth estimation and the trajectory estimation compared to the baseline and other systems.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9102487PMC
http://dx.doi.org/10.3390/s22093389DOI Listing

Publication Analysis

Top Keywords

visual-inertial slam
12
depth representation
12
depth estimation
12
depth
11
implicit depth
8
tightly-coupled graph
8
graph optimization
8
dense maps
8
sparse maps
8
slam systems
8

Similar Publications

Visual-Inertial Odometry (VIO) systems often suffer from degraded performance in environments with low texture. Although some previous works have combined line features with point features to mitigate this problem, the line features still degrade under more challenging conditions, such as varying illumination. To tackle this, we propose DeepLine-VIO, a robust VIO framework that integrates learned line features extracted via an attraction-field-based deep network.

View Article and Find Full Text PDF

Simultaneous Localization and Mapping (SLAM) remains challenging in dynamic environments. Recent approaches combining deep learning with algorithms for dynamic scenes comprise two types: faster, less accurate object detection-based methods and highly accurate, computationally costly instance segmentation-based methods. In addition, maps lacking semantic information hinder robots from understanding their environment and performing complex tasks.

View Article and Find Full Text PDF

This paper presents SE2-LET-VINS, an enhanced Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual-Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching.

View Article and Find Full Text PDF

Monocular visual-inertial odometry based on the MSCKF algorithm has demonstrated computational efficiency even with limited resources. Moreover, the MSCKF-VIO is primarily designed for localization tasks, where environmental features such as points, lines, and planes are tracked across consecutive images. These tracked features are subsequently triangulated using the historical IMU/camera poses in the state vector to perform measurement updates.

View Article and Find Full Text PDF

A Review of Simultaneous Localization and Mapping for the Robotic-Based Nondestructive Evaluation of Infrastructures.

Sensors (Basel)

January 2025

Department of Mechanical Engineering and Mechanics (MEM), Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA.

The maturity of simultaneous localization and mapping (SLAM) methods has now reached a significant level that motivates in-depth and problem-specific reviews. The focus of this study is to investigate the evolution of vision-based, LiDAR-based, and a combination of these methods and evaluate their performance in enclosed and GPS-denied (EGD) conditions for infrastructure inspection. This paper categorizes and analyzes the SLAM methods in detail, considering the sensor fusion type and chronological order.

View Article and Find Full Text PDF