Fine-Grained Video Retrieval With Scene Sketches.

Ran Zuo , Xiaoming Deng , Keqi Chen , Zhengming Zhang , Yu-Kun Lai , Fang Liu , Cuixia Ma , Hao Wang , Yong-Jin Liu , Hongan Wang

IEEE Trans Image Process

Published: June 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.g., appearance, size, pose, etc.) of video; we call such a task "fine-grained scene-level SBVR". The most challenging issue in this task is how to perform scene-level cross-modal alignment between sketch and video. Our solution consists of two parts. First, we construct a scene-level sketch-video dataset called SketchVideo, in which sketch-video pairs are provided and each pair contains a clip-level storyboard sketch and several keyframe sketches (corresponding to video frames). Second, we propose a novel deep learning architecture called Sketch Query Graph Convolutional Network (SQ-GCN). In SQ-GCN, we first adaptively sample the video frames to improve video encoding efficiency, and then construct appearance and category graphs to jointly model visual and semantic alignment between sketch and video. Experiments show that our fine-grained scene-level SBVR framework with SQ-GCN architecture outperforms the state-of-the-art fine-grained retrieval methods. The SketchVideo dataset and SQ-GCN code are available in the project webpage https://iscas-mmsketch.github.io/FG-SL-SBVR/.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2023.3278474	DOI Listing

Publication Analysis

Top Keywords

video retrieval

video

storyboard sketch

alignment sketch

sketch video

video frames

sketch

fine-grained

retrieval

fine-grained video

Similar Publications

Posttraumatic Stress Disorder Content on TikTok: Cross-Sectional Analysis of Popular #PTSD Posts.

Online J Public Health Inform

September 2025

Clinical and Health Psychology, College of Public Health and Health Professions, University of Florida, 1225 Center Drive, Gainesville, FL, 32610, United States, 1 (352) 273-6617.

Brittany Rohl , Laura Carolyn Jones , Rachel Nattis , Robert Dale Claar , Xavier Velez

Background: TikTok became an increasingly popular platform for mental health discussions during a major global stressor (COVID-19 pandemic). On TikTok, content assumed to promote user engagement is delivered in a hyperindividually curated manner through a proprietary algorithm. Mental health providers have raised concerns about TikTok's potential role in promoting inaccurate self-diagnoses, pathologizing normal behaviors, and fostering new-onset symptoms after exposure to illness-related content, such as tic-like movements linked to conversion or factitious disorders.

View Article and Find Full Text PDF

Similar Publications

Temporal Modeling With Frozen Vision-Language Foundation Models for Parameter-Efficient Text-Video Retrieval.

IEEE Trans Neural Netw Learn Syst

September 2025

Leqi Shen , Tianxiang Hao , Tao He , Yifeng Zhang , Pengzhang Liu

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

Similar Publications

Quality and Reliability of Adolescent Sexuality Education on Chinese Video Platforms: Sentiment-Topic Analysis and Cross-Sectional Study.

JMIR Form Res

September 2025

Department of Orthopedics, The First People's Hospital of Guannan: Lianyun, Lianyungang, China.

Lan Wang , Xiantao Shu , Jianmei Huang , Weiqian Yan , Duo Zhao

Background: Adolescence is a critical period for lifelong health, which makes access to accurate and comprehensive sexuality education essential. As video platforms become a primary source of information for adolescents, the quality of their content significantly impacts their physical and mental health.

Objective: This study aimed to evaluate the quality, reliability, understandability, and actionability of adolescent sexuality education videos on major Chinese platforms (Bilibili, TikTok or Douyin, and Kwai), analyze associated user comment sentiment and topics, identify predictors of quality and reliability, and provide recommendations.

View Article and Find Full Text PDF

Similar Publications

Mind Over Scalpel: Effectiveness of Preoperative Surgical Education.

J Am Coll Surg

September 2025

Division of Plastic and Reconstructive Surgery, Department of Surgery, University of California Los Angeles, David Geffen School of Medicine and the Greater Los Angeles VA Healthcare System, Los Angeles, CA.

Anne E Hall , Amanda T Perrotta , Alexander A Argame , Kaavian Shariati , Meghan N Miller

Multimodal preoperative educational interventions, delivered in various formats including written materials, videos, websites, and more, have shown potential in improving postoperative outcomes. Given the evolving landscape of surgical education, the effectiveness of these diverse strategies requires further assessment. This systematic review, meta-analysis and network meta-analysis evaluated multimodal preoperative educational interventions and their impact on surgical outcomes.

View Article and Find Full Text PDF

Similar Publications

Is YouTube a reliable source for learning pre-endodontic build-up? A cross-sectional study.

Restor Dent Endod

August 2025

Department of Endodontics, Faculty of Dentistry, Marmara University, Istanbul, Türkiye.

Merve Gökyar , İdil Özden , Hesna Sazak Öveçoğlu

Objectives: The aim of this study is to comprehensively analyze the quality, educational value, and demographic characteristics of pre-endodontic build-up videos published on the YouTube™ platform (Google LLC).

Methods: The study was conducted on YouTube™ using the keyword "pre-endodontic build-up." The first 100 videos retrieved from the search results were reviewed, and 61 videos meeting the inclusion criteria were analyzed.

View Article and Find Full Text PDF

Similar Publications