S²Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR.

Jialun Pei , Diandian Guo , Jingyang Zhang , Manxi Lin , Yueming Jin , Pheng-Ann Heng

IEEE Trans Med Imaging

Published: January 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TMI.2024.3444279	DOI Listing

Publication Analysis

Top Keywords

graph generation

single-stage bi-modal

bi-modal transformer

scene graph

sgg

s²former-or single-stage

transformer scene

generation scene

generation sgg

sgg surgical

Similar Publications

The energy landscape of folding in n-C14H30 described by a machine-learned potential.

J Chem Phys

September 2025

Yusuf Hamied Department of Chemistry. Lensfield Road, Cambridge CB2 1EW, United Kingdom.

Thomas C Allison , Joel M Bowman , Paul L Houston , Yuthika Pillai , Chen Qu

Folding and unfolding in molecules as simple as short hydrocarbons and as complicated as large proteins continue to be an active research field. Here, we investigate folding in n-C14H30 using both density functional theory (DFT)/B3LYP calculations of 27 772 local minima and a kinetic transition network calculated for a previously reported potential energy surface (PES) obtained by fitting roughly 250 000 B3LYP energies. In addition to generating a database of minima and the transition states that connect them, these calculations and the PES based on them have been used to develop a simple and accurate model for the energy landscape.

View Article and Find Full Text PDF

Similar Publications

Integrative gene and isoform co-expression networks reveal regulatory rewiring in stress-related psychiatric disorders.

iScience

September 2025

Max Planck Institute of Psychiatry, 80804 Munich, Germany.

Ghalia Rehawi , Jonas Hagenberg , , , Philipp G Sämann

Isoform-specific expression patterns have been linked to stress-related psychiatric disorders such as major depressive disorder (MDD). To further explore their involvement, we constructed co-expression networks using total gene expression (TE) and isoform ratio (IR) data from affected ( = 210, 81% with depressive symptoms) and unaffected ( = 95) individuals. Networks were validated using advanced graph generation methods.

View Article and Find Full Text PDF

Similar Publications

Hubs, influencers, and communities of executive functions: a task-based fMRI graph analysis.

Front Hum Neurosci

August 2025

Baptist Medical Center, Department of Behavioral Health, Jacksonville, FL, United States.

Alexandra T Davis

Introduction: This study investigates four subdomains of executive functioning-initiation, cognitive inhibition, mental shifting, and working memory-using task-based functional magnetic resonance imaging (fMRI) data and graph analysis.

Methods: We used healthy adults' functional magnetic resonance imaging (fMRI) data to construct brain connectomes and network graphs for each task and analyzed global and node-level graph metrics.

Results: The bilateral precuneus and right medial prefrontal cortex emerged as pivotal hubs and influencers, emphasizing their crucial regulatory role in all four subdomains of executive function.

View Article and Find Full Text PDF

Similar Publications

A Pure Transformer Pretraining Framework on Text-attributed Graphs.

Proc Mach Learn Res

November 2024

Michigan State University.

Yu Song , Haitao Mao , Jiachen Xiao , Jingzhe Liu , Zhikai Chen

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.

View Article and Find Full Text PDF

Similar Publications

SAGERank: inductive learning of protein-protein interaction from antibody-antigen recognition.

Chem Sci

August 2025

Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University Shanghai 200240 China

Chuance Sun , Xiangyi Li , Honglin Xu , Yike Tang , Ganggang Bai

Predicting Antibody-Antigen (Ab-Ag) docking and structure-based design represent significant long-term and therapeutically important challenges in computational biology. We present SAGERank, a general, configurable deep learning framework for antibody design using Graph Sample and Aggregate Networks. SAGERank successfully predicted the majority of epitopes in a cancer target dataset.

View Article and Find Full Text PDF

Similar Publications