FSDM: An efficient video super-resolution method based on Frames-Shift Diffusion Model.

Neural Netw

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, Jiangsu, China. Electronic address:

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Video super-resolution is a fundamental task aimed at enhancing video quality through intricate modeling techniques. Recent advancements in diffusion models have significantly enhanced image super-resolution processing capabilities. However, their integration into video super-resolution workflows remains constrained due to the computational complexity of temporal fusion modules, demanding more computational resources compared to their image counterparts. To address this challenge, we propose a novel approach: a Frames-Shift Diffusion Model based on the image diffusion models. Compared to directly training diffusion-based video super-resolution models, redesigning the diffusion process of image models without introducing complex temporal modules requires minimal training consumption. We incorporate temporal information into the image super-resolution diffusion model by using optical flow and perform multi-frame fusion. This model adapts the diffusion process to smoothly transition from image super-resolution to video super-resolution diffusion without additional weight parameters. As a result, the Frames-Shift Diffusion Model efficiently processes videos frame by frame while maintaining computational efficiency and achieving superior performance. It enhances perceptual quality and achieves comparable performance to other state-of-the-art diffusion-based VSR methods in PSNR and SSIM. This approach optimizes video super-resolution by simplifying the integration of temporal data, thus addressing key challenges in the field.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2025.107435DOI Listing

Publication Analysis

Top Keywords

video super-resolution
24
diffusion model
16
frames-shift diffusion
12
image super-resolution
12
super-resolution
9
diffusion
9
diffusion models
8
diffusion process
8
super-resolution diffusion
8
video
7

Similar Publications

Deep learning-based super-resolution method for projection image compression in radiotherapy.

Quant Imaging Med Surg

September 2025

Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Background: Cone-beam computed tomography (CBCT) is a three-dimensional (3D) imaging method designed for routine target verification of cancer patients during radiotherapy. The images are reconstructed from a sequence of projection images obtained by the on-board imager attached to a radiotherapy machine. CBCT images are usually stored in a health information system, but the projection images are mostly abandoned due to their massive volume.

View Article and Find Full Text PDF

Efficiently compressing HD/UHD content has long been challenging due to high bitrate costs. Instance-adaptive enhancement methods try to tackle this issue by compressing a video at reduced resolution and enhancing it using a neural model specifically overfitted for this video. However, existing methods focus solely on spatial super-resolution (SR) and under-utilize the videos' temporal redundancy.

View Article and Find Full Text PDF

Enhancing the resolution of Magnetic Resonance Imaging (MRI) through super-resolution (SR) reconstruction is crucial for boosting diagnostic precision. However, current SR methods primarily rely on single LR images or multi-contrast features, limiting detail restoration. Inspired by video frame interpolation, this work utilizes the spatiotemporal correlations between adjacent slices to reformulate the SR task of anisotropic 3D-MRI image into the generation of new high-resolution (HR) slices between adjacent 2D slices.

View Article and Find Full Text PDF

There is an ongoing effort in the machine learning community to enable machines to understand the world symbolically, facilitating human interaction with learned representations of complex scenes. A pre-requisite to achieving this is the ability to identify the dynamics of interacting objects from time traces of relevant features. In this paper, we introduce GrODID (GRaph-based Object-Centric Dynamic Mode Decomposition), a framework based on graph neural networks that enables Dynamic Mode Decomposition for systems involving interacting objects.

View Article and Find Full Text PDF

Single-photon avalanche diodes (SPADs) are advanced sensors capable of detecting individual photons and recording their arrival times with picosecond resolution using time-correlated single-photon counting (TCSPC) detection techniques. They are used in various applications, such as LiDAR and low-light imaging. These single-photon cameras can capture high-speed sequences of binary single-photon images, offering great potential for reconstructing 3D environments with high motion dynamics.

View Article and Find Full Text PDF