OM-VST: A video action recognition model based on optimized downsampling module combined with multi-scale feature fusion.

Xiaozhong Geng , Cheng Chen , Ping Yu , Baijin Liu , Weixin Hu , Qipeng Liang , Xintong Zhang

PLoS One

Jilin University of Finance and Economics, Changchun, Jilin, China.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Video classification, as an essential task in computer vision, aims to identify and label video content using computer technology automatically. However, the current mainstream video classification models face two significant challenges in practical applications: first, the classification accuracy is not high, which is mainly attributed to the complexity and diversity of video data, including factors such as subtle differences between different categories, background interference, and illumination variations; and second, the number of model training parameters is too high resulting in longer training time and increased energy consumption. To solve these problems, we propose the OM-Video Swin Transformer (OM-VST) model. This model adds a multi-scale feature fusion module with an optimized downsampling module based on a Video Swin Transformer (VST) to improve the model's ability to perceive and characterize feature information. To verify the performance of the OM-VST model, we conducted comparison experiments between it and mainstream video classification models, such as VST, SlowFast, and TSM, on a public dataset. The results show that the accuracy of the OM-VST model is improved by 2.81% while the number of parameters is reduced by 54.7%. This improvement significantly enhances the model's accuracy in video classification tasks and effectively reduces the number of parameters during model training.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884693	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0318884	PLOS

Publication Analysis

Top Keywords

video classification

om-vst model

optimized downsampling

downsampling module

multi-scale feature

feature fusion

mainstream video

classification models

model training

swin transformer

Similar Publications

AI Model Based on Diaphragm Ultrasound to Improve the Predictive Performance of Invasive Mechanical Ventilation Weaning: Prospective Cohort Study.

JMIR Form Res

September 2025

Department of Critical Care Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangdong Provincial Geriatrics Institute, No. 106, Zhongshaner Rd, Guangzhou, 510080, China, 86 15920151904.

Feier Song , Huazhang Liu , Huan Ma , Xuanhui Chen , Shouhong Wang

Background: Point-of-care ultrasonography has become a valuable tool for assessing diaphragmatic function in critically ill patients receiving invasive mechanical ventilation. However, conventional diaphragm ultrasound assessment remains highly operator-dependent and subjective. Previous research introduced automatic measurement of diaphragmatic excursion and velocity using 2D speckle-tracking technology.

View Article and Find Full Text PDF

Similar Publications

Rapid label-free identification of seven bacterial species using microfluidics, single-cell time-lapse phase-contrast microscopy, and deep learning-based image and video classification.

PLoS One

September 2025

Department of Information Technology, Uppsala University, Uppsala, Sweden.

Erik Hallström , Vinodh Kandavalli , Carolina Wählby , Anders Hast

For effective treatment of bacterial infections, it is essential to identify the species causing the infection as early as possible. Current methods typically require hours of overnight culturing of a bacterial sample and a larger quantity of cells to function effectively. This study uses one-hour phase-contrast time-lapses of single-cell bacterial growth collected from microfluidic chip traps, also known as a "mother machine".

View Article and Find Full Text PDF

Similar Publications

Machine Learning and Lexical Rule-Based Cost-Efficient Emotion Annotation of Hinglish Utterances.

J Vis Exp

August 2025

Chitkara University Institute of Engineering & Technology, Chitkara University.

Pratibha Verma , Amandeep Kaur , Meenu Khurana , Deepali Gupta

Emotion annotation in code-mixed languages like Hinglish (Hindi-English) presents unique challenges due to linguistic complexity and resource constraints. This study introduces a hybrid active learning framework that combines lexical rules, machine learning, and iterative expert feedback to achieve cost-efficient, high-accuracy emotion annotation. Grounded in psychological theories of emotion, including Discrete Emotions Theory and Cognitive Appraisal Theory, the framework employs bilingual emotion dictionaries (e.

View Article and Find Full Text PDF

Similar Publications

Analysis of Influence of Clinical Nursing Pathway Construction and Implementation on Patient Outcomes in Anesthesia Recovery.

J Vis Exp

August 2025

Department of Anesthesiology, Affiliated Hospital, Gansu University of Chinese Medicine;

Hongxia Wang , Yushan Shang , Xinhua Yang , Aiqian Zhan , Yangnan Li

The application of the clinical nursing pathway in the anesthesia recovery room is of great significance for improving nursing quality and reducing the incidence of complications. However, the influence of the clinical nursing pathway construction scheme and implementation path on patient outcomes in the anesthesia recovery room is not clear. In this study, 200 patients in the surgical anesthesia recovery room, aged 50 to 70 years old and graded as American Society of Anesthesiologists Physical Status Classification System (ASA) II-III, were randomly divided into the control group (n=100) and the interventional group (n=100).

View Article and Find Full Text PDF

Similar Publications

Ultrasound Video-Based Radiomics Analysis for Differentiating Benign and Malignant Breast Lesions.

Technol Cancer Res Treat

September 2025

Department of Nephrology, Dongyang People's Hospital, Dongyang, China.

Jiangfeng Wu , Lijing Ge , Yun Jin , Xiaoyun Wang

ObjectiveTo evaluate the diagnostic performance of a combined model incorporating ultrasound video-based radiomics features and clinical variables for distinguishing between benign and malignant breast lesions.MethodsA total of 346 patients (173 benign and 173 malignant) were retrospectively enrolled. Breast ultrasound videos were acquired and processed using semi-automatic segmentation in 3D Slicer.

View Article and Find Full Text PDF

Similar Publications