Temporal Features-Fused Vision Retentive Network for Echocardiography Image Segmentation.

Zhicheng Lin , Rongpu Cui , Limiao Ning , Jian Peng

Sensors (Basel)

College of Computer Science, Sichuan University, Chengdu 610065, China.

Published: March 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Echocardiography is a widely used cardiac imaging modality in clinical practice. Physicians utilize echocardiography images to measure left ventricular volumes at end-diastole (ED) and end-systole (ES) frames, which are pivotal for calculating the ejection fraction and thus quantitatively assessing cardiac function. However, most existing approaches focus on features from ES frames and ED frames, neglecting the inter-frame correlations in unlabeled frames. Our model is based on an encoder-decoder architecture and consists of two modules: the Temporal Feature Fusion Module (TFFA) and the Vision Retentive Network (Vision RetNet) encoder. The TFFA leverages self-attention to learn inter-frame correlations across multiple consecutive frames and aggregates the features of the temporal-channel dimension through channel aggregation to highlight ambiguity regions. The Vision RetNet encoder introduces explicit spatial priors by constructing a spatial decay matrix using the Manhattan distance. We conducted experiments on the EchoNet-Dynamic dataset and the CAMUS dataset, where our proposed model demonstrates competitive performance. The experimental results indicate that spatial prior information and inter-frame correlations in echocardiography images can enhance the accuracy of semantic segmentation, and inter-frame correlations become even more effective when spatial priors are provided.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11946786	PMC
http://dx.doi.org/10.3390/s25061909	DOI Listing

Publication Analysis

Top Keywords

inter-frame correlations

vision retentive

retentive network

echocardiography images

vision retnet

retnet encoder

spatial priors

frames

temporal features-fused

vision

Similar Publications

Echo Flow-Induced Temporal Correlation Learning for Ultrasound Video Object Segmentation.

IEEE Trans Biomed Eng

August 2025

Dongfang Wang , Tao Zhou , Shangbing Gao , Jian Yang

Objective: The segmentation of ultrasound video objects aims to delineate specific anatomical structures or areas of injury in sequential ultrasound imaging data. Current methods exhibit promising results, but struggle with key aspects of ultrasound video analysis. They insufficiently capture inter-frame object motion, resulting in unsatisfactory segmentation for dynamic or low-contrast scenarios.

View Article and Find Full Text PDF

Similar Publications

Deformable image registration of dark-field chest radiographs for functional lung assessment.

Med Phys

August 2025

Chair of Biomedical Physics, Department of Physics, School of Natural Sciences, Technical University of Munich, Garching, Germany.

Fabian Drexel , Vasiliki Sideri-Lampretsa , Henriette Bast , Alexander W Marka , Thomas Koehler

Background: Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, most previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state.

Purpose: Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states to explore new ways of functional lung imaging based on dark-field chest radiography.

View Article and Find Full Text PDF

Similar Publications

A new dataset and versatile multi-task surgical workflow analysis framework for thoracoscopic mitral valvuloplasty.

Med Image Anal

October 2025

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong Special Administrative Region of China. Electronic address:

Meng Lan , Weixin Si , Xinjian Yan , Xiaomeng Li

Surgical Workflow Analysis (SWA) on videos is critical for AI-assisted intelligent surgery. Existing SWA methods primarily focus on laparoscopic surgeries, while research on complex thoracoscopy-assisted cardiac surgery remains largely unexplored. In this paper, we introduce TMVP-SurgVideo, the first SWA video dataset for thoracoscopic cardiac mitral valvuloplasty (TMVP).

View Article and Find Full Text PDF

Similar Publications

CrossModalSync: joint temporal-spatial fusion for semantic scene segmentation in large-scale scenes.

Sci Rep

July 2025

The Department of Electrical and Computer Engineering, Inha University, Incheon, 22212, Korea.

Shuyi Tan , Yi Zhang , Yan Li , Byeong-Seok Shin

Owing to its ability to enable precise perception of dynamic and complex environments, point cloud semantic segmentation has become a critical task for autonomously driven vehicles in recent years. However, in complex, dynamic scenes, cumulative errors and the "many-to-one" mapping problem are challenges for existing semantic segmentation methods, which further limit their accuracy and efficiency. To address these, this paper introduces a new framework that balances accuracy and computational efficiency by utilizing temporal alignment (TA), projection multi-scale convolution (PMC), and priority point retention (PPR).

View Article and Find Full Text PDF

Similar Publications

Data-driven assessment of optimal spatiotemporal resolutions for information extraction in noisy time series data.

J Chem Phys

June 2025

Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy.

Domiziano Doria , Simone Martino , Matteo Becchi , Giovanni M Pavan

In general, comprehension of any type of complex system depends on the resolution used to examine the phenomena occurring within it. However, identifying a priori, for example, the best time frequencies/scales to study a certain system over time, or the spatial distances at which correlations, symmetries, and fluctuations are most often non-trivial. Here, we describe an unsupervised approach that, starting solely from the data of a system, allows learning the characteristic length scales of the dominant key events/processes and the optimal spatiotemporal resolutions to characterize them.

View Article and Find Full Text PDF

Similar Publications