SSIFNet: Spatial-temporal stereo information fusion network for self-supervised surgical video inpainting.

Comput Med Imaging Graph

Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. Electronic address:

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

During minimally invasive robot-assisted surgical procedures, surgeons rely on stereo endoscopes to provide image guidance. Nevertheless, the field-of-view is typically restricted owing to the limited size of the endoscope and constrained workspace. Such a visualization challenge becomes even more severe when surgical instruments are inserted into the already restricted field-of-view, where important anatomical landmarks and relevant clinical contents may become occluded by the inserted instruments. To address the challenge, in this work, we propose a novel end-to-end trainable spatial-temporal stereo information fusion network, referred as SSIFNet, for inpainting surgical videos of surgical scene under instrument occlusions in robot-assisted endoscopic surgery. The proposed SSIFNet features three essential modules including a novel optical flow-guided deformable feature propagation (OFDFP) module, a novel spatial-temporal stereo focal transformer (SFT)-based information fusion module, and a novel stereo-consistency enforcement (SE) module. These three modules work synergistically to inpaint occluded regions in the surgical scene. More importantly, SSIFNet is trained in a self-supervised manner with simulated occlusions by a novel loss function, which is designed to combine flow completion, disparity matching, cross-warping consistency, warping-consistency, image and adversarial loss terms to generate high fidelity and accurate occlusion reconstructions in both views. After training, the trained model can be applied directly to inpainting surgical videos with true instrument occlusions to generate results with not only spatial and temporal consistency but also stereo-consistency. Comprehensive quantitative and qualitative experimental results demonstrate that SSIFNet outperforms state-of-the-art (SOTA) video inpainting methods. The source code of this study will be released at https://github.com/SHAUNZXY/SSIFNet.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compmedimag.2025.102622DOI Listing

Publication Analysis

Top Keywords

spatial-temporal stereo
12
stereo fusion
8
fusion network
8
video inpainting
8
inpainting surgical
8
surgical videos
8
surgical scene
8
instrument occlusions
8
module novel
8
surgical
7

Similar Publications

SSIFNet: Spatial-temporal stereo information fusion network for self-supervised surgical video inpainting.

Comput Med Imaging Graph

August 2025

Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. Electronic address:

During minimally invasive robot-assisted surgical procedures, surgeons rely on stereo endoscopes to provide image guidance. Nevertheless, the field-of-view is typically restricted owing to the limited size of the endoscope and constrained workspace. Such a visualization challenge becomes even more severe when surgical instruments are inserted into the already restricted field-of-view, where important anatomical landmarks and relevant clinical contents may become occluded by the inserted instruments.

View Article and Find Full Text PDF

Spatiotemporal transcriptomic landscape of rice embryonic cells during seed germination.

Dev Cell

September 2024

Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China; Hainan Institute, Zhejiang University, Sanya 572025, China. Electronic address:

Characterizing cellular features during seed germination is crucial for understanding the complex biological functions of different embryonic cells in regulating seed vigor and seedling establishment. We performed spatially enhanced resolution omics sequencing (Stereo-seq) and single-cell RNA sequencing (scRNA-seq) to capture spatially resolved single-cell transcriptomes of germinating rice embryos. An automated cell-segmentation model, employing deep learning, was developed to accommodate the analysis requirements.

View Article and Find Full Text PDF

Event-based cameras are commonly leveraged to mitigate issues such as motion blur, low dynamic range, and limited time sampling, which plague conventional cameras. However, a lack of dedicated event-based datasets for benchmarking segmentation algorithms, especially those offering critical depth information for occluded scenes, has been observed. In response, this paper introduces a novel Event-based Segmentation Dataset (ESD), a high-quality event 3D spatial-temporal dataset designed for indoor object segmentation within cluttered environments.

View Article and Find Full Text PDF
Article Synopsis
  • Event-based structured light (SL) systems utilize bio-inspired event cameras for high-speed applications but often overlook spatio-temporal consistency in depth measurement.
  • This study introduces a novel SL system combining a laser point projector and event camera, implementing a spatial-temporal coding strategy for simultaneous depth encoding.
  • The proposed Spatio-Temporal Enhanced Matching (STEM) approach improves 3D reconstruction through dual-domain information integration and a tailored stereo matching algorithm, achieving impressive performance with a reconstruction rate of 16 fps and minimal error.
View Article and Find Full Text PDF

National dance is an important symbol of national spiritual culture, as it embodies each nation's unique history, living habits, ideology, and culture. Many Chinese ethnic groups have developed their own dance forms and styles, each with their own set of charms. Strengthening our understanding of its style and characteristics is critical to our understanding of the national dance.

View Article and Find Full Text PDF