OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point.

Timo Teufel , Hongchao Shu , Roger D Soberanis-Mukul , Jan Emily Mangulabnan , Manish Sahu , S Swaroop Vedula , Masaru Ishii , Gregory Hager , Russell H Taylor , Mathias Unberath

Int J Comput Assist Radiol Surg

Johns Hopkins University, Baltimore, MD, 21211, USA.

Published: July 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: Monocular SLAM algorithms are the key enabling technology for image-based surgical navigation systems for endoscopic procedures. Due to the visual feature scarcity and unique lighting conditions encountered in endoscopy, classical SLAM approaches perform inconsistently. Many of the recent approaches to endoscopic SLAM rely on deep learning models. They show promising results when optimized on singular domains such as arthroscopy, sinus endoscopy, colonoscopy or laparoscopy, but are limited by an inability to generalize to different domains without retraining.

Methods: To address this generality issue, we propose OneSLAM a monocular SLAM algorithm for surgical endoscopy that works out of the box for several endoscopic domains, including sinus endoscopy, colonoscopy, arthroscopy and laparoscopy. Our pipeline builds upon robust tracking any point (TAP) foundation models to reliably track sparse correspondences across multiple frames and runs local bundle adjustment to jointly optimize camera poses and a sparse 3D reconstruction of the anatomy.

Results: We compare the performance of our method against three strong baselines previously proposed for monocular SLAM in endoscopy and general scenes. OneSLAM presents better or comparable performance over existing approaches targeted to that specific data in all four tested domains, generalizing across domains without the need for retraining.

Conclusion: OneSLAM benefits from the convincing performance of TAP foundation models but generalizes to endoscopic sequences of different anatomies all while demonstrating better or comparable performance over domain-specific SLAM approaches. Future research on global loop closure will investigate how to reliably detect loops in endoscopic scenes to reduce accumulated drift and enhance long-term navigation capabilities.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s11548-024-03171-6	DOI Listing

Publication Analysis

Top Keywords

monocular slam

tracking point

slam approaches

sinus endoscopy

endoscopy colonoscopy

tap foundation

foundation models

better comparable

comparable performance

slam

Similar Publications

Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies.

Sensors (Basel)

August 2025

Engineering Faculty, Transport and Telecommunication Institute, Lauvas Iela 2, LV-1019 Riga, Latvia.

Emmanuel A Merchán-Cruz , Samuel Moveh , Oleksandr Pasha , Reinis Tocelovskis , Alexander Grakovski

Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors.

View Article and Find Full Text PDF

Similar Publications

LET-SE2-VINS: A Hybrid Optical Flow Framework for Robust Visual-Inertial SLAM.

Sensors (Basel)

June 2025

School of Mechanical and Electrical Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China.

Wei Zhao , Hongyang Sun , Songsong Ma , Haitao Wang

This paper presents SE2-LET-VINS, an enhanced Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) system built upon the classic Visual-Inertial Navigation System for Monocular Cameras (VINS-Mono) framework, designed to improve localization accuracy and robustness in complex environments. By integrating Lightweight Neural Network (LET-NET) for high-quality feature extraction and Special Euclidean Group in 2D (SE2) optical flow tracking, the system achieves superior performance in challenging scenarios such as low lighting and rapid motion. The proposed method processes Inertial Measurement Unit (IMU) data and camera data, utilizing pre-integration and RANdom SAmple Consensus (RANSAC) for precise feature matching.

View Article and Find Full Text PDF

Similar Publications

Direct monocular vision algorithm based on deep constraints of point and line features fusion.

Opt Lett

June 2025

Keqiang Bai , Xiuhong Li , Yalan Zhu , Feiyan Wang

Unlabelled: The development of technology and the rapid increase of computing power have enabled the wide application of simultaneous localization and mapping (SLAM) in smart devices. Nevertheless, visual odometry based on the direct method exhibits inaccurate pose estimation in structured environments, because it ignores diverse line segment information, constraints of associated points, and estimated position information.

Objective: This study aimed to address the issue of inaccurate pose estimation in structured environments for direct method-based visual odometry by proposing a direct monocular vision algorithm based on deep constraints of point and line features (DMVA-PLF), with the goal of improving pose estimation accuracy.

View Article and Find Full Text PDF

Similar Publications

An Adaptive Threshold-Based Pixel Point Tracking Algorithm Using Reference Features Leveraging the Multi-State Constrained Kalman Filter Feature Point Triangulation Technique for Depth Mapping the Environment.

Sensors (Basel)

April 2025

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China.

Zohaib Wahab Memon , Yu Chen , Hai Zhang

Monocular visual-inertial odometry based on the MSCKF algorithm has demonstrated computational efficiency even with limited resources. Moreover, the MSCKF-VIO is primarily designed for localization tasks, where environmental features such as points, lines, and planes are tracked across consecutive images. These tracked features are subsequently triangulated using the historical IMU/camera poses in the state vector to perform measurement updates.

View Article and Find Full Text PDF

Similar Publications

Monocular Initialization for Real-Time Feature-Based SLAM in Dynamic Environments with Multiple Frames.

Sensors (Basel)

April 2025

Space Control and Inertial Technology Research Center, School of Astronautics, Harbin Institute of Technology, Harbin 150001, China.

Hexuan Dou , Bo Liu , Yinghao Jia , Changhong Wang

Two-view epipolar initialization for feature-based monocular SLAM with the RANSAC approach is challenging in dynamic environments. This paper presents a universal and practical method for improving the automatic estimation of initial poses and landmarks across multiple frames in real time. Image features corresponding to the same spatial points are matched and tracked across consecutive frames, and those that belong to stationary points are identified using ST-RANSAC, an algorithm designed to detect inliers based on both spatial and temporal consistency.

View Article and Find Full Text PDF

Similar Publications