Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In this paper, we study the cross-view geo-localization problem to match images from different viewpoints. The key motivation underpinning this task is to learn a discriminative viewpoint-invariant visual representation. Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network. Specifically, we introduce a Unit Subtraction Attention Module (USAM) that can automatically discover representative keypoints from feature maps and draw attention to the salient regions. USAM contains very few learning parameters but yields significant performance improvement and can be easily plugged into different networks. We demonstrate through extensive experiments that (1) by incorporating USAM, RK-Net facilitates end-to-end joint learning without the prerequisite of extra annotations. Representation learning and keypoint detection are two highly-related tasks. Representation learning aids keypoint detection. Keypoint detection, in turn, enriches the model capability against large appearance changes caused by viewpoint variants. (2) USAM is easy to implement and can be integrated with existing methods, further improving the state-of-the-art performance. We achieve competitive geo-localization accuracy on three challenging datasets, i. e., University-1652, CVUSA and CVACT. Our code is available at https://github.com/AggMan96/RK-Net.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2022.3175601DOI Listing

Publication Analysis

Top Keywords

keypoint detection
16
representation learning
12
learning keypoint
8
cross-view geo-localization
8
learn discriminative
8
learning
5
joint representation
4
keypoint
4
detection
4
detection cross-view
4

Similar Publications

Precision livestock farming increasingly relies on non-invasive, high-fidelity systems capable of monitoring cattle with minimal disruption to behavior or welfare. Conventional identification methods, such as ear tags and wearable sensors, often compromise animal comfort and produce inconsistent data under real-world farm conditions. This study introduces Dairy DigiD, a deep learning-based biometric classification framework that categorizes dairy cattle into four physiologically defineda groups-young, mature milking, pregnant, and dry cows-using high-resolution facial images.

View Article and Find Full Text PDF

Recent breakthroughs in marker-less pose-estimation have driven a significant transformation in computer-vision approaches. Despite the emergence of state-of-the-art keypoint-detection algorithms, the extent to which these tools are employed and the nature of their application in scientific research has yet to be systematically documented. We systematically reviewed the literature to assess how pose-estimation techniques are currently applied in rodent (rat and mouse) models.

View Article and Find Full Text PDF

Lameness in dairy cattle is a prevalent issue that significantly impacts both animal welfare and farm productivity. Traditional lameness detection methods often rely on subjective visual assessment, focusing on changes in locomotion and back curvature. However, these methods can lack consistency and accuracy, particularly for early-stage detection.

View Article and Find Full Text PDF

Feature-based image matching has extensive applications in computer vision. Keypoints detected in images can be naturally represented as graph structures, and Graph Neural Networks (GNNs) have been shown to outperform traditional deep learning techniques. Consequently, the paradigm of image matching via GNNs has gained significant prominence in recent academic research.

View Article and Find Full Text PDF

Feasibility of Real-Time Automated Vocal Fold Motion Tracking for In-Office Laryngoscopy.

Laryngoscope

September 2025

Department of Otolaryngology-Head and Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA.

Objectives: Major advancements have been made in applying artificial intelligence and computer vision to analyze videolaryngoscopy data. These models are limited to post hoc analysis and are aimed at research settings. In this work, we assess the feasibility of a real-time solution for automated vocal fold tracking during in-office laryngoscopy.

View Article and Find Full Text PDF