Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Unsupervised methods have received increasing attention in homography learning due to their promising performance and label-free training. However, existing methods do not explicitly consider the plane-induced parallax, making the prediction compromised on multiple planes. In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. First, a multi-scale transformer is designed to predict homography from the feature pyramids of input images in a coarse-to-fine fashion. Moreover, we propose an unsupervised GAN to impose coplanarity constraint on the predicted homography, which is realized by using a generator to predict a mask of aligned regions, and then a discriminator to check if two masked feature maps are induced by a single homography. Based on the global homography framework, we extend it to the local mesh-grid homography estimation, namely, MeshHomoGAN, where plane constraints can be enforced on each mesh cell to go beyond a single dominant plane, such that scenes with multiple depth planes can be better aligned. To validate the effectiveness of our method and its components, we conduct extensive experiments on large-scale datasets. Results show that our matching error is 22% lower than previous SOTA methods. Code is available at https://github.com/megvii-research/HomoGAN.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3509614DOI Listing

Publication Analysis

Top Keywords

homography estimation
12
homography
8
dominant plane
8
unsupervised
4
unsupervised global
4
global local
4
local homography
4
estimation coplanarity-aware
4
coplanarity-aware gan
4
gan unsupervised
4

Similar Publications

The 3D (three-dimensional) reconstruction of the anterior segment obtained from AS-OCT scanning devices is essential for diagnosing and monitoring cornea and iris, as well as for localizing and quantifying keratitis. However, this process faces two significant challenges: (1) The consecutive images acquired through rotational scanning are difficult to align and register; (2) The existing medical image segmentation technology cannot effectively segment the cornea, which are critical preprocessing steps for effective 3D visualization of the anterior segment. To tackle these dual challenges, an encoder-shared visual state space network for the 3D reconstruction of the anterior segment is proposed.

View Article and Find Full Text PDF

Markerless video analysis represents an opportunity for conducting efficient in-situ motion analysis of athletes during competitions. From monocular video data, we propose a robust end-to-end method to automatically capture the 2D trajectory of athletes' position on planar ground, even in highly occluded contexts. A tracking-by-detection algorithm is first applied on a short sequence to build a specific contextual dataset ('self-supervision').

View Article and Find Full Text PDF

Image matching is a critical task in computer vision research, focusing on aligning two or more images with similar features. Feature detection and description constitute the core of image matching. Handcrafted detectors are capable of obtaining distinctive points but these points may not be repeatable on the image pairs especially those with dramatic appearance changes.

View Article and Find Full Text PDF

In this article, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal 4-point configuration, two similarity transformations in SKS are computed by two anchor points on source and target planes, respectively. Then, the other two point correspondences can be exploited to compute the middle kernel transformation with only four parameters.

View Article and Find Full Text PDF

Geometry-Constrained Learning-Based Visual Servoing with Projective Homography-Derived Error Vector.

Sensors (Basel)

April 2025

Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.

We propose a novel geometry-constrained learning-based method for camera-in-hand visual servoing systems that eliminates the need for camera intrinsic parameters, depth information, and the robot's kinematic model. Our method uses a cerebellar model articulation controller (CMAC) to execute online Jacobian estimation within the control framework. Specifically, we introduce a fixed-dimension, uniform-magnitude error function based on the projective homography matrix.

View Article and Find Full Text PDF