Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization.

Jinliang Lin , Zhedong Zheng , Zhun Zhong , Zhiming Luo , Shaozi Li , Yi Yang , Nicu Sebe

IEEE Trans Image Process

Published: June 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

In this paper, we study the cross-view geo-localization problem to match images from different viewpoints. The key motivation underpinning this task is to learn a discriminative viewpoint-invariant visual representation. Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network. Specifically, we introduce a Unit Subtraction Attention Module (USAM) that can automatically discover representative keypoints from feature maps and draw attention to the salient regions. USAM contains very few learning parameters but yields significant performance improvement and can be easily plugged into different networks. We demonstrate through extensive experiments that (1) by incorporating USAM, RK-Net facilitates end-to-end joint learning without the prerequisite of extra annotations. Representation learning and keypoint detection are two highly-related tasks. Representation learning aids keypoint detection. Keypoint detection, in turn, enriches the model capability against large appearance changes caused by viewpoint variants. (2) USAM is easy to implement and can be integrated with existing methods, further improving the state-of-the-art performance. We achieve competitive geo-localization accuracy on three challenging datasets, i. e., University-1652, CVUSA and CVACT. Our code is available at https://github.com/AggMan96/RK-Net.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2022.3175601	DOI Listing

Publication Analysis

Top Keywords

keypoint detection

representation learning

learning keypoint

cross-view geo-localization

learn discriminative

learning

joint representation

keypoint

detection

detection cross-view

Similar Publications

Dairy DigiD: a keypoint-based deep learning system for classifying dairy cattle by physiological and reproductive status.

Front Artif Intell

August 2025

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada.

Shubhangi Mahato , Hanqing Bi , Suresh Neethirajan

Precision livestock farming increasingly relies on non-invasive, high-fidelity systems capable of monitoring cattle with minimal disruption to behavior or welfare. Conventional identification methods, such as ear tags and wearable sensors, often compromise animal comfort and produce inconsistent data under real-world farm conditions. This study introduces Dairy DigiD, a deep learning-based biometric classification framework that categorizes dairy cattle into four physiologically defineda groups-young, mature milking, pregnant, and dry cows-using high-resolution facial images.

View Article and Find Full Text PDF

Similar Publications

Does advancement in marker-less pose-estimation mean more quality research? A systematic review.

Front Behav Neurosci

August 2025

Department of Orthopedic Surgery, Inha University Hospitals, Incheon, Republic of Korea.

Shivam Bhola , Hyun-Bin Kim , Hyeon Su Kim , BonSang Gu , Jun-Il Yoo

Recent breakthroughs in marker-less pose-estimation have driven a significant transformation in computer-vision approaches. Despite the emergence of state-of-the-art keypoint-detection algorithms, the extent to which these tools are employed and the nature of their application in scientific research has yet to be systematically documented. We systematically reviewed the literature to assess how pose-estimation techniques are currently applied in rodent (rat and mouse) models.

View Article and Find Full Text PDF

Similar Publications

Automated detection of lameness in dairy cattle through computer vision analysis of back shape characteristics.

Comput Biol Med

September 2025

Julius Wolff Institute, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany. Electronic address:

S Serhan Narli , Hendrik Schmidt , Ali Firouzabadi , Lukas Schönnagel , Marcel Simon Reich

Lameness in dairy cattle is a prevalent issue that significantly impacts both animal welfare and farm productivity. Traditional lameness detection methods often rely on subjective visual assessment, focusing on changes in locomotion and back curvature. However, these methods can lack consistency and accuracy, particularly for early-stage detection.

View Article and Find Full Text PDF

Similar Publications

GIMS: Image matching system based on adaptive graph construction and graph neural network.

Neural Netw

August 2025

Faculty of Applied Science, University of British Columbia, Kelowna, Canada. Electronic address:

Xianfeng Song , Yi Zou , Zheng Shi , Zheng Liu

Feature-based image matching has extensive applications in computer vision. Keypoints detected in images can be naturally represented as graph structures, and Graph Neural Networks (GNNs) have been shown to outperform traditional deep learning techniques. Consequently, the paradigm of image matching via GNNs has gained significant prominence in recent academic research.

View Article and Find Full Text PDF

Similar Publications

Feasibility of Real-Time Automated Vocal Fold Motion Tracking for In-Office Laryngoscopy.

Laryngoscope

September 2025

Department of Otolaryngology-Head and Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA.

Aki Koivu , Obinna I Nwosu , Mitsuki Ota , Kristina Simonyan , Matthew R Naunheim

Objectives: Major advancements have been made in applying artificial intelligence and computer vision to analyze videolaryngoscopy data. These models are limited to post hoc analysis and are aimed at research settings. In this work, we assess the feasibility of a real-time solution for automated vocal fold tracking during in-office laryngoscopy.

View Article and Find Full Text PDF

Similar Publications