Dual modality prompt learning for visual question-grounded answering in robotic surgery.

Yue Zhang , Wanshu Fan , Peixi Peng , Xin Yang , Dongsheng Zhou , Xiaopeng Wei

Vis Comput Ind Biomed Art

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116081, Liaoning, China.

Published: April 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551084	PMC
http://dx.doi.org/10.1186/s42492-024-00160-z	DOI Listing

Publication Analysis

Top Keywords

robotic surgery

prompt learning

visual textual

complementary prompter

prompt knowledge

visual

textual

prompt

dual modality

modality prompt

Similar Publications

Specimen extraction techniques utilized in minimally invasive surgery for uterine cancer and an enlarged uterus: a quality assurance study.

J Robot Surg

September 2025

Department of Gynecologic Oncology, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 33612, USA.

Anna Quian , Ann Marie Mercier , Clarissa Lam , Robert M Wenham , Hye Sook Chon

This study was conducted to investigate the techniques and complications of enlarged uterine extraction during minimally invasive surgery for uterine malignancy. The electronic medical record was queried for patients with uterine malignancy and enlarged uterus (≥ 250 g) who underwent primary hysterectomy with laparoscopic or robotic approach. Statistical analysis was performed using Fisher's exact test for categorical variables and Kruskal-Wallis test for continuous variables.

View Article and Find Full Text PDF

Similar Publications

Systematic review and meta-analysis of the learning curve and mentoring in robotic pancreatectomy: transitioning from novice to expert robotic surgeon.

J Robot Surg

September 2025

Department of General Surgery, Giglio Hospital Foundation, Cefalu', Italy.

Danilo Coco , Silvana Leanza

The adoption of robotic pancreatectomy has grown significantly in recent years, driven by its potential advantages in precision, minimally invasive access, and improved patient recovery. However, mastering these complex procedures requires overcoming a substantial learning curve, and the role of structured mentoring in facilitating this transition remains underexplored. This systematic review and meta-analysis aimed to comprehensively evaluate the number of cases required to achieve surgical proficiency, assess the impact of mentoring on skill acquisition, and analyze how outcomes evolve throughout the learning process.

View Article and Find Full Text PDF

Similar Publications

Pancreatic anastomosis in minimally invasive pancreaticoduodenectomy: a systematic review of detailed techniques.

Updates Surg

September 2025

Surgical Department, HPB Unit Pederzoli Hospital, Peschiera del Garda, Verona, Italy.

Marzia Tripepi , Isabella Frigerio , Roberto Girelli , Alessandro Giardino , Giovanni Butturini

Minimally invasive pancreaticoduodenectomy is gaining success among surgeons also for the increasing use of robotic approach. Ideal candidates are patients with small, confined tumor and dilatated Wirsung duct which is a quite rare clinical conditions: in fact, most of minimally invasive pancreaticoduodenectomies are performed for periampullary cancer, easy to remove but with soft pancreatic remnant and tiny Wirsung duct. The result is the technical challenge of the pancreatico-enteric reconstructions.

View Article and Find Full Text PDF

Similar Publications

Critique on "Hysterectomy for oncological and non-oncological reasons in patients over 70 years of age: comparison of robot-assisted, laparoscopic, and open approaches".

J Robot Surg

September 2025

D.G Khan Medical College, Dera Ghazi Khan, Pakistan.

Kamran Hussain , Abida Nawab , Abdul Rehman

View Article and Find Full Text PDF

Similar Publications

"Comment on: Impact of extended reality on robot-assisted surgery training: methodological insights and future directions".

J Robot Surg

September 2025

Jinnah Postgraduate Medical Centre (JPMC), Karachi, Pakistan.

Bhoomeeka Jayramdass , Manohar Lal

View Article and Find Full Text PDF

Similar Publications