Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific vision tasks. Yet, existing methods either employ complex spatial-temporal modules or rely heavily on additional perception models to extract temporal features for video understanding, performing well only on short videos. For long videos, the computational complexity and memory costs associated with long-term temporal connections are significantly increased, posing additional challenges. Leveraging the hierarchical memory structure of the Atkinson-Shiffrin memory model, with tokens in Transformers being employed as the carriers of memory in combination, we propose MovieChat within a training-free memory consolidation mechanism to overcome these challenges, which transfers dense frames from short-term memory into sparse tokens in long-term memory by temporally merging adjacent frames. We lift pre-trained large multi-modal models for understanding long videos without additional trainable modules, employing a zero-shot approach. Additionally, in our new version, MovieChat+, we design an enhanced training-free vision-question matching-based memory consolidation mechanism to better anchor predictions to relevant visual content. MovieChat achieves state-of-the-art performance in long video understanding, along with the released MovieChat-1K benchmark with 1K long video, 2K temporal grounding labels, and 14K manual annotations. Resources are available at: https://github.com/rese1f/MovieChat.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2025.3604614DOI Listing

Publication Analysis

Top Keywords

long video
12
video understanding
12
memory
9
long videos
8
memory consolidation
8
consolidation mechanism
8
video
6
long
5
moviechat+ question-aware
4
question-aware sparse
4

Similar Publications

Background And Objective: This study aims to analyze the clinical characteristics of anti-GABAR encephalitis in pediatric patients. Due to its rarity and diagnostic challenges in children, we compare clinical features between adult and pediatric cases.

Materials And Methods: Using the key words "anti-GABAR encephalitis, children, autoimmune encephalitis, limbic encephalitis", we conduct a comprehensive literature review of all studies related to anti-GABAR encephalitis published from January 2010 to January 2024.

View Article and Find Full Text PDF

Patient-reported outcomes after lobectomy vs. segmentectomy for early-stage non-small cell lung cancer.

Surg Endosc

September 2025

Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.

Background: Surgical resection is the cornerstone for early-stage non-small cell lung cancer (NSCLC), with lobectomy historically standard. Evolving techniques have spurred debate comparing lobectomy and segmentectomy. This study analyzed early postoperative patient-reported symptoms and functional status in patients with early NSCLC undergoing either procedure.

View Article and Find Full Text PDF

Background: The study aimed to adapt a stress and well-being intervention delivered via a mobile health (mHealth) app for Latinx Millennial caregivers. This demographic, born between 1981 and 1996, represents a significant portion of caregivers in the United States, with unique challenges due to higher mental distress and poorer physical health compared to non-caregivers. Latinx Millennial caregivers face additional barriers, including higher uninsured rates and increased caregiving burdens.

View Article and Find Full Text PDF

Human beings have the ability to continuously analyze a video and immediately extract the motion components. We want to adopt this paradigm to provide a coherent and stable motion segmentation over the video sequence. In this perspective, we propose a novel long-term spatio-temporal model operating in a totally unsupervised way.

View Article and Find Full Text PDF

Virtual Reality for Analgesia During Intrauterine Device Insertion: Randomized Controlled Trial.

JMIR Serious Games

September 2025

Women's and Newborn Program, Monash Health, 246 Clayton Rd, Melbourne, 3168, Australia, 61 395946666.

Background: Intrauterine devices (IUDs) are safe and effective long-acting reversible contraceptive therapies that are also used as minimally invasive treatment for heavy menstrual bleeding, endometrial hyperplasia, and early-stage endometrial cancer. Despite many advantages, IUDs are underused predominantly due to patient discomfort. Although many techniques have been explored previously in the literature, there is currently little consensus on effective analgesic strategies.

View Article and Find Full Text PDF