SSIFNet: Spatial-temporal stereo information fusion network for self-supervised surgical video inpainting.

Xiaoyang Zou , Zhuyuan Zhang , Derong Yu , Wenyuan Sun , Wenyong Liu , Donghua Hang , Wei Bao , Guoyan Zheng

Comput Med Imaging Graph

Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. Electronic address:

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

During minimally invasive robot-assisted surgical procedures, surgeons rely on stereo endoscopes to provide image guidance. Nevertheless, the field-of-view is typically restricted owing to the limited size of the endoscope and constrained workspace. Such a visualization challenge becomes even more severe when surgical instruments are inserted into the already restricted field-of-view, where important anatomical landmarks and relevant clinical contents may become occluded by the inserted instruments. To address the challenge, in this work, we propose a novel end-to-end trainable spatial-temporal stereo information fusion network, referred as SSIFNet, for inpainting surgical videos of surgical scene under instrument occlusions in robot-assisted endoscopic surgery. The proposed SSIFNet features three essential modules including a novel optical flow-guided deformable feature propagation (OFDFP) module, a novel spatial-temporal stereo focal transformer (SFT)-based information fusion module, and a novel stereo-consistency enforcement (SE) module. These three modules work synergistically to inpaint occluded regions in the surgical scene. More importantly, SSIFNet is trained in a self-supervised manner with simulated occlusions by a novel loss function, which is designed to combine flow completion, disparity matching, cross-warping consistency, warping-consistency, image and adversarial loss terms to generate high fidelity and accurate occlusion reconstructions in both views. After training, the trained model can be applied directly to inpainting surgical videos with true instrument occlusions to generate results with not only spatial and temporal consistency but also stereo-consistency. Comprehensive quantitative and qualitative experimental results demonstrate that SSIFNet outperforms state-of-the-art (SOTA) video inpainting methods. The source code of this study will be released at https://github.com/SHAUNZXY/SSIFNet.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.compmedimag.2025.102622	DOI Listing

Publication Analysis

Top Keywords

spatial-temporal stereo

stereo fusion

fusion network

video inpainting

inpainting surgical

surgical videos

surgical scene

instrument occlusions

module novel

surgical

Similar Publications

SSIFNet: Spatial-temporal stereo information fusion network for self-supervised surgical video inpainting.

Comput Med Imaging Graph

August 2025

Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. Electronic address:

Xiaoyang Zou , Zhuyuan Zhang , Derong Yu , Wenyuan Sun , Wenyong Liu

View Article and Find Full Text PDF

Similar Publications

Spatiotemporal transcriptomic landscape of rice embryonic cells during seed germination.

Dev Cell

September 2024

Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China; Hainan Institute, Zhejiang University, Sanya 572025, China. Electronic address:

Jie Yao , Qinjie Chu , Xing Guo , Wenwen Shao , Nianmin Shang

Characterizing cellular features during seed germination is crucial for understanding the complex biological functions of different embryonic cells in regulating seed vigor and seedling establishment. We performed spatially enhanced resolution omics sequencing (Stereo-seq) and single-cell RNA sequencing (scRNA-seq) to capture spatially resolved single-cell transcriptomes of germinating rice embryos. An automated cell-segmentation model, employing deep learning, was developed to accommodate the analysis requirements.

View Article and Find Full Text PDF

Similar Publications

A neuromorphic dataset for tabletop object segmentation in indoor cluttered environment.

Sci Data

January 2024

Advanced Research and Innovation Center (ARIC), Khalifa University, Abu Dhabi, UAE.

Xiaoqian Huang , Sanket Kachole , Abdulla Ayyad , Fariborz Baghaei Naeini , Dimitrios Makris

Event-based cameras are commonly leveraged to mitigate issues such as motion blur, low dynamic range, and limited time sampling, which plague conventional cameras. However, a lack of dedicated event-based datasets for benchmarking segmentation algorithms, especially those offering critical depth information for occluded scenes, has been observed. In response, this paper introduces a novel Event-based Segmentation Dataset (ESD), a high-quality event 3D spatial-temporal dataset designed for indoor object segmentation within cluttered environments.

View Article and Find Full Text PDF

Similar Publications

Fast 3D reconstruction via event-based structured light with spatio-temporal coding.

Opt Express

December 2023

Jiacheng Fu , Yueyi Zhang , Yue Li , Jiacheng Li , Zhiwei Xiong

Article Synopsis

Event-based structured light (SL) systems utilize bio-inspired event cameras for high-speed applications but often overlook spatio-temporal consistency in depth measurement.
This study introduces a novel SL system combining a laser point projector and event camera, implementing a spatial-temporal coding strategy for simultaneous depth encoding.
The proposed Spatio-Temporal Enhanced Matching (STEM) approach improves 3D reconstruction through dual-domain information integration and a tailored stereo matching algorithm, achieving impressive performance with a reconstruction rate of 16 fps and minimal error.

View Article and Find Full Text PDF

Similar Publications

Analysis of the Style Characteristics of National Dance Based on 3D Reconstruction.

Comput Intell Neurosci

July 2022

Yangtze University, Jingzhou 434020, China.

Dongqi Zhang

National dance is an important symbol of national spiritual culture, as it embodies each nation's unique history, living habits, ideology, and culture. Many Chinese ethnic groups have developed their own dance forms and styles, each with their own set of charms. Strengthening our understanding of its style and characteristics is critical to our understanding of the national dance.

View Article and Find Full Text PDF

Similar Publications