RetinaViT: Efficient Visual Backbone for Online Video Streams.

Sensors (Basel)

Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan.

Published: August 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaViT, an efficient method for extracting frame-level visual features in an online video stream, aiming to fundamentally enhance the efficiency of online video understanding tasks. RetinaViT is composed of efficiently approximated Transformer blocks that only take changed tokens (event tokens) as queries and reuse the already processed tokens from the previous timestep for the others. Furthermore, we restrict keys and values to the spatial neighborhoods of event tokens to further improve efficiency. RetinaViT involves tuning multiple parameters, which we determine through a multi-step process. During model training, we randomly vary these parameters and then perform black-box optimization to maximize accuracy and efficiency on the pre-trained model. We conducted extensive experiments on various online video recognition tasks, including action recognition, pose estimation, and object segmentation, validating the effectiveness of each component in RetinaViT and demonstrating improvements in the speed/accuracy trade-off compared to baselines. In particular, for action recognition, RetinaViT built on ViT-B16 reduces inference time by approximately 61.9% on the CPU and 50.8% on the GPU, while achieving slight accuracy improvements rather than degradation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11397805PMC
http://dx.doi.org/10.3390/s24175457DOI Listing

Publication Analysis

Top Keywords

online video
20
retinavit efficient
8
video understanding
8
frame-level visual
8
event tokens
8
action recognition
8
retinavit
6
online
5
video
5
efficient visual
4

Similar Publications

Effects of an Oral Health Management Program During Pregnancy: A Randomized Controlled Trial.

Oral Dis

September 2025

Stomatology Hospital, School of Stomatology, Zhejiang, University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, Zhejiang, China.

Objective: To evaluate the effects of a Health Belief Model-based oral health management program on self-efficacy, oral health behaviors, and three periodontal clinical indicators among pregnant women.

Study Design: A randomized controlled trial was conducted with 65 participants randomly allocated to the intervention (n = 39) and control (n = 26) groups. The intervention included one face-to-face education, three video calls, two online lectures, and regular follow-up supervision in 1 month, while the control group received one face-to-face education.

View Article and Find Full Text PDF

Background: Remote services (in which the patient and staff member are not physically colocated) and digital services (in which a patient encounter is digitally mediated in some way) were introduced extensively when the COVID-19 pandemic began in 2020. We undertook a longitudinal qualitative study of the introduction, embedding, evolution and abandonment of remote and digital innovations in United Kingdom general practice. This synoptic paper summarises study design, methods, key findings, outputs and impacts to date.

View Article and Find Full Text PDF

The proliferation of online gambling platforms has heightened concerns over their potential to intensify problematic gambling behaviors. While previous research has examined various risk factors, the influence of prior online gaming experience on gambling transitions remains underexplored. This study investigates whether and how engagement with online gaming facilitates the migration from offline to online gambling.

View Article and Find Full Text PDF

Background: Adolescence is a critical period for lifelong health, which makes access to accurate and comprehensive sexuality education essential. As video platforms become a primary source of information for adolescents, the quality of their content significantly impacts their physical and mental health.

Objective: This study aimed to evaluate the quality, reliability, understandability, and actionability of adolescent sexuality education videos on major Chinese platforms (Bilibili, TikTok or Douyin, and Kwai), analyze associated user comment sentiment and topics, identify predictors of quality and reliability, and provide recommendations.

View Article and Find Full Text PDF