98%
921
2 minutes
20
Purpose: Monocular SLAM algorithms are the key enabling technology for image-based surgical navigation systems for endoscopic procedures. Due to the visual feature scarcity and unique lighting conditions encountered in endoscopy, classical SLAM approaches perform inconsistently. Many of the recent approaches to endoscopic SLAM rely on deep learning models. They show promising results when optimized on singular domains such as arthroscopy, sinus endoscopy, colonoscopy or laparoscopy, but are limited by an inability to generalize to different domains without retraining.
Methods: To address this generality issue, we propose OneSLAM a monocular SLAM algorithm for surgical endoscopy that works out of the box for several endoscopic domains, including sinus endoscopy, colonoscopy, arthroscopy and laparoscopy. Our pipeline builds upon robust tracking any point (TAP) foundation models to reliably track sparse correspondences across multiple frames and runs local bundle adjustment to jointly optimize camera poses and a sparse 3D reconstruction of the anatomy.
Results: We compare the performance of our method against three strong baselines previously proposed for monocular SLAM in endoscopy and general scenes. OneSLAM presents better or comparable performance over existing approaches targeted to that specific data in all four tested domains, generalizing across domains without the need for retraining.
Conclusion: OneSLAM benefits from the convincing performance of TAP foundation models but generalizes to endoscopic sequences of different anatomies all while demonstrating better or comparable performance over domain-specific SLAM approaches. Future research on global loop closure will investigate how to reliably detect loops in endoscopic scenes to reduce accumulated drift and enhance long-term navigation capabilities.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s11548-024-03171-6 | DOI Listing |
Sensors (Basel)
August 2025
Engineering Faculty, Transport and Telecommunication Institute, Lauvas Iela 2, LV-1019 Riga, Latvia.
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors.
View Article and Find Full Text PDFSensors (Basel)
June 2025
School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China.
This paper presents SE2-LET-VINS, an enhanced Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual-Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching.
View Article and Find Full Text PDFUnlabelled: The development of technology and the rapid increase of computing power have enabled the wide application of simultaneous localization and mapping (SLAM) in smart devices. Nevertheless, visual odometry based on the direct method exhibits inaccurate pose estimation in structured environments, because it ignores diverse line segment information, constraints of associated points, and estimated position information.
Objective: This study aimed to address the issue of inaccurate pose estimation in structured environments for direct method-based visual odometry by proposing a direct monocular vision algorithm based on deep constraints of point and line features (DMVA-PLF), with the goal of improving pose estimation accuracy.
Sensors (Basel)
April 2025
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China.
Monocular visual-inertial odometry based on the MSCKF algorithm has demonstrated computational efficiency even with limited resources. Moreover, the MSCKF-VIO is primarily designed for localization tasks, where environmental features such as points, lines, and planes are tracked across consecutive images. These tracked features are subsequently triangulated using the historical IMU/camera poses in the state vector to perform measurement updates.
View Article and Find Full Text PDFSensors (Basel)
April 2025
Space Control and Inertial Technology Research Center, School of Astronautics, Harbin Institute of Technology, Harbin 150001, China.
Two-view epipolar initialization for feature-based monocular SLAM with the RANSAC approach is challenging in dynamic environments. This paper presents a universal and practical method for improving the automatic estimation of initial poses and landmarks across multiple frames in real time. Image features corresponding to the same spatial points are matched and tracked across consecutive frames, and those that belong to stationary points are identified using ST-RANSAC, an algorithm designed to detect inliers based on both spatial and temporal consistency.
View Article and Find Full Text PDF