98%
921
2 minutes
20
Lately, video-language pre-training and text-video retrieval have attracted significant attention with the explosion of multimedia data on the Internet. However, existing approaches for video-language pre-training typically limit the exploitation of the hierarchical semantic information in videos, such as frame semantic information and global video semantic information. In this work, we present an end-to-end pre-training network with Hierarchical Matching and Momentum Contrast named HMMC. The key idea is to explore the hierarchical semantic information in videos via multilevel semantic matching between videos and texts. This design is motivated by the observation that if a video semantically matches a text (can be a title, tag or caption), the frames in this video usually have semantic connections with the text and show higher similarity than frames in other videos. Hierarchical matching is mainly realized by two proxy tasks: Video-Text Matching (VTM) and Frame-Text Matching (FTM). Another proxy task: Frame Adjacency Matching (FAM) is proposed to enhance the single visual modality representations while training from scratch. Furthermore, momentum contrast framework was introduced into HMMC to form a multimodal momentum contrast framework, enabling HMMC to incorporate more negative samples for contrastive learning which contributes to the generalization of representations. We also collected a large-scale Chinese video-language dataset (over 763k unique videos) named CHVTT to explore the multilevel semantic connections between videos and texts. Experimental results on two major Text-video retrieval benchmark datasets demonstrate the advantages of our methods. We release our code at https://github.com/cheetah003/HMMC.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2023.3275071 | DOI Listing |
PLoS One
September 2025
Department of Economics, Cornell University, Ithaca, United States of America.
In this paper, we study the impact of momentum, volume and investor sentiment on U.S. tech sector stock returns using Principal Component Analysis-Hidden Markov Model (PCA-HMM) methodology.
View Article and Find Full Text PDFComp Polit Stud
October 2025
University of Zurich, Zurich, Switzerland.
Extensive research explores the relationship between deepening conflict over socio-cultural issues and stagnating social mobility, typically focusing on men. Upwardly mobile women are routinely mentioned as belonging to the progressive "winners" of the knowledge-based society, yet their experiences and politics have received far less attention. This paper theorizes and investigates how women view their individual and collective trajectories and how these views relate to perceptions of future opportunities and political attitudes.
View Article and Find Full Text PDFNat Commun
August 2025
Institute of Acoustics, Tongji University, Shanghai, China.
Chiral vortex beams with tunable topological charges (TCs) hold promise for high-capacity and multi-channel information transmission. However, asymmetric vortex transport, a crucial feature for enhancing robustness and security, often disrupts channel independence by altering TCs, causing signal distortion. Here, we exploit the radial mode degree of freedom in chiral space to achieve extremely asymmetric transmission with high energy contrast, while preserving chirality and TCs.
View Article and Find Full Text PDFLight Sci Appl
August 2025
State Key Laboratory of Quantum Optics Technologies and Devices, Institute of Laser Spectroscopy, Shanxi University, Taiyuan, 030006, China.
Exploring the interplay between topology and nonlinearity leads to an emerging field of nonlinear topological physics, which extends the study of fascinating properties of topological states to a regime where interactions between the particles cannot be neglected. For ultracold atomic systems, although many exotic topological states have been recently observed, the nonlinear effect remains elusive. Here, based on the laser-driven couplings of discrete atomic momentum states, we synthesize a topological trimer array, where the atomic interactions give rise to tunable nonlinearities.
View Article and Find Full Text PDFNeural Netw
August 2025
National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, 610064, PR China; College of Computer Science, Sichuan University, Chengdu, 610065, PR China. Electronic address:
The primary goal of change captioning is to identify subtle visual differences between two similar images and express them in natural language. Existing research has been significantly influenced by the task of vision change detection and has mainly concentrated on the identification and description of visual changes. However, we contend that an effective change captioner should go beyond mere detection and description of what has changed.
View Article and Find Full Text PDF