DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with Implicit Depth Representation and Tightly-Coupled Graph Optimization.

Mingle Zhao , Dingfu Zhou , Xibin Song , Xiuwan Chen , Liangjun Zhang

Sensors (Basel)

Robotics and Autonomous Driving Laboratory, Baidu Research, Beijing 100085, China.

Published: April 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Recently, generating dense maps in real-time has become a hot research topic in the mobile robotics community, since dense maps can provide more informative and continuous features compared with sparse maps. Implicit depth representation (e.g., the depth code) derived from deep neural networks has been employed in the visual-only or visual-inertial simultaneous localization and mapping (SLAM) systems, which achieve promising performances on both camera motion and local dense geometry estimations from monocular images. However, the existing visual-inertial SLAM systems combined with depth codes are either built on a filter-based SLAM framework, which can only update poses and maps in a relatively small local time window, or based on a loosely-coupled framework, while the prior geometric constraints from the depth estimation network have not been employed for boosting the state estimation. To well address these drawbacks, we propose DiT-SLAM, a novel real-time ense visual-inertial SLAM with mplicit depth representation and ightly-coupled graph optimization. Most importantly, the poses, sparse maps, and low-dimensional depth codes are optimized with the tightly-coupled graph by considering the visual, inertial, and depth residuals simultaneously. Meanwhile, we propose a light-weight monocular depth estimation and completion network, which is combined with attention mechanisms and the conditional variational auto-encoder (CVAE) to predict the uncertainty-aware dense depth maps from more low-dimensional codes. Furthermore, a robust point sampling strategy introducing the spatial distribution of 2D feature points is also proposed to provide geometric constraints in the tightly-coupled optimization, especially for textureless or featureless cases in indoor environments. We evaluate our system on open benchmarks. The proposed methods achieve better performances on both the dense depth estimation and the trajectory estimation compared to the baseline and other systems.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9102487	PMC
http://dx.doi.org/10.3390/s22093389	DOI Listing

Publication Analysis

Top Keywords

visual-inertial slam

depth representation

depth estimation

depth

implicit depth

tightly-coupled graph

graph optimization

dense maps

sparse maps

slam systems

Similar Publications

Robust Visual-Inertial Odometry with Learning-Based Line Features in a Illumination-Changing Environment.

Sensors (Basel)

August 2025

James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK.

Xinkai Li , Cong Liu , Xu Yan

Visual-Inertial Odometry (VIO) systems often suffer from degraded performance in environments with low texture. Although some previous works have combined line features with point features to mitigate this problem, the line features still degrade under more challenging conditions, such as varying illumination. To tackle this, we propose DeepLine-VIO, a robust VIO framework that integrates learned line features extracted via an attraction-field-based deep network.

View Article and Find Full Text PDF

Similar Publications

LVID-SLAM: A Lightweight Visual-Inertial SLAM for Dynamic Scenes Based on Semantic Information.

Sensors (Basel)

July 2025

School of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou 310018, China.

Shuwen Wang , Qiming Hu , Xu Zhang , Wei Li , Ying Wang

Simultaneous Localization and Mapping (SLAM) remains challenging in dynamic environments. Recent approaches combining deep learning with algorithms for dynamic scenes comprise two types: faster, less accurate object detection-based methods and highly accurate, computationally costly instance segmentation-based methods. In addition, maps lacking semantic information hinder robots from understanding their environment and performing complex tasks.

View Article and Find Full Text PDF

Similar Publications

LET-SE2-VINS: A Hybrid Optical Flow Framework for Robust Visual-Inertial SLAM.

Sensors (Basel)

June 2025

School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China.

Wei Zhao , Hongyang Sun , Songsong Ma , Haitao Wang

This paper presents SE2-LET-VINS, an enhanced Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual-Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching.

View Article and Find Full Text PDF

Similar Publications

An Adaptive Threshold-Based Pixel Point Tracking Algorithm Using Reference Features Leveraging the Multi-State Constrained Kalman Filter Feature Point Triangulation Technique for Depth Mapping the Environment.

Sensors (Basel)

April 2025

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China.

Zohaib Wahab Memon , Yu Chen , Hai Zhang

Monocular visual-inertial odometry based on the MSCKF algorithm has demonstrated computational efficiency even with limited resources. Moreover, the MSCKF-VIO is primarily designed for localization tasks, where environmental features such as points, lines, and planes are tracked across consecutive images. These tracked features are subsequently triangulated using the historical IMU/camera poses in the state vector to perform measurement updates.

View Article and Find Full Text PDF

Similar Publications

A Review of Simultaneous Localization and Mapping for the Robotic-Based Nondestructive Evaluation of Infrastructures.

Sensors (Basel)

January 2025

Department of Mechanical Engineering and Mechanics (MEM), Drexel University, 3141 Chestnut St., Philadelphia, PA 19104, USA.

Ali Ghadimzadeh Alamdari , Farzad Azizi Zade , Arvin Ebrahimkhanlou

The maturity of simultaneous localization and mapping (SLAM) methods has now reached a significant level that motivates in-depth and problem-specific reviews. The focus of this study is to investigate the evolution of vision-based, LiDAR-based, and a combination of these methods and evaluate their performance in enclosed and GPS-denied (EGD) conditions for infrastructure inspection. This paper categorizes and analyzes the SLAM methods in detail, considering the sensor fusion type and chronological order.

View Article and Find Full Text PDF

Similar Publications