Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TMI.2024.3444279DOI Listing

Publication Analysis

Top Keywords

graph generation
12
single-stage bi-modal
8
bi-modal transformer
8
scene graph
8
sgg
5
s²former-or single-stage
4
transformer scene
4
generation scene
4
generation sgg
4
sgg surgical
4

Similar Publications

The energy landscape of folding in n-C14H30 described by a machine-learned potential.

J Chem Phys

September 2025

Yusuf Hamied Department of Chemistry. Lensfield Road, Cambridge CB2 1EW, United Kingdom.

Folding and unfolding in molecules as simple as short hydrocarbons and as complicated as large proteins continue to be an active research field. Here, we investigate folding in n-C14H30 using both density functional theory (DFT)/B3LYP calculations of 27 772 local minima and a kinetic transition network calculated for a previously reported potential energy surface (PES) obtained by fitting roughly 250 000 B3LYP energies. In addition to generating a database of minima and the transition states that connect them, these calculations and the PES based on them have been used to develop a simple and accurate model for the energy landscape.

View Article and Find Full Text PDF

Isoform-specific expression patterns have been linked to stress-related psychiatric disorders such as major depressive disorder (MDD). To further explore their involvement, we constructed co-expression networks using total gene expression (TE) and isoform ratio (IR) data from affected ( = 210, 81% with depressive symptoms) and unaffected ( = 95) individuals. Networks were validated using advanced graph generation methods.

View Article and Find Full Text PDF

Hubs, influencers, and communities of executive functions: a task-based fMRI graph analysis.

Front Hum Neurosci

August 2025

Baptist Medical Center, Department of Behavioral Health, Jacksonville, FL, United States.

Introduction: This study investigates four subdomains of executive functioning-initiation, cognitive inhibition, mental shifting, and working memory-using task-based functional magnetic resonance imaging (fMRI) data and graph analysis.

Methods: We used healthy adults' functional magnetic resonance imaging (fMRI) data to construct brain connectomes and network graphs for each task and analyzed global and node-level graph metrics.

Results: The bilateral precuneus and right medial prefrontal cortex emerged as pivotal hubs and influencers, emphasizing their crucial regulatory role in all four subdomains of executive function.

View Article and Find Full Text PDF

Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges represented by feature heterogeneity and structural heterogeneity. Recent efforts have been made to address feature heterogeneity via Large Language Models (LLMs) on text-attributed graphs (TAGs) by generating fixed-length text representations as node features.

View Article and Find Full Text PDF

Predicting Antibody-Antigen (Ab-Ag) docking and structure-based design represent significant long-term and therapeutically important challenges in computational biology. We present SAGERank, a general, configurable deep learning framework for antibody design using Graph Sample and Aggregate Networks. SAGERank successfully predicted the majority of epitopes in a cancer target dataset.

View Article and Find Full Text PDF