Unsupervised Global and Local Homography Estimation with Coplanarity-Aware GAN.

Shuaicheng Liu , Mingbo Hong , Yuhang Lu , Nianjin Ye , Chunyu Lin , Bing Zeng

IEEE Trans Pattern Anal Mach Intell

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Unsupervised methods have received increasing attention in homography learning due to their promising performance and label-free training. However, existing methods do not explicitly consider the plane-induced parallax, making the prediction compromised on multiple planes. In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. First, a multi-scale transformer is designed to predict homography from the feature pyramids of input images in a coarse-to-fine fashion. Moreover, we propose an unsupervised GAN to impose coplanarity constraint on the predicted homography, which is realized by using a generator to predict a mask of aligned regions, and then a discriminator to check if two masked feature maps are induced by a single homography. Based on the global homography framework, we extend it to the local mesh-grid homography estimation, namely, MeshHomoGAN, where plane constraints can be enforced on each mesh cell to go beyond a single dominant plane, such that scenes with multiple depth planes can be better aligned. To validate the effectiveness of our method and its components, we conduct extensive experiments on large-scale datasets. Results show that our matching error is 22% lower than previous SOTA methods. Code is available at https://github.com/megvii-research/HomoGAN.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2024.3509614	DOI Listing

Publication Analysis

Top Keywords

homography estimation

homography

dominant plane

unsupervised

unsupervised global

global local

local homography

estimation coplanarity-aware

coplanarity-aware gan

gan unsupervised

Similar Publications

Encoder-shared visual state space network for anterior segment reconstruction.

Comput Med Imaging Graph

September 2025

School of Media Engineering, Communication University of Zhejiang, Hangzhou, 310018, China.

Guiping Qian , Huaqiong Wang , Shan Luo , Yiming Sun , Dingguo Yu

The 3D (three-dimensional) reconstruction of the anterior segment obtained from AS-OCT scanning devices is essential for diagnosing and monitoring cornea and iris, as well as for localizing and quantifying keratitis. However, this process faces two significant challenges: (1) The consecutive images acquired through rotational scanning are difficult to align and register; (2) The existing medical image segmentation technology cannot effectively segment the cornea, which are critical preprocessing steps for effective 3D visualization of the anterior segment. To tackle these dual challenges, an encoder-shared visual state space network for the 3D reconstruction of the anterior segment is proposed.

View Article and Find Full Text PDF

Similar Publications

Monocular markerless position tracking of elite amateur boxing fighters in real combat situation.

J Sports Sci

June 2025

INRIA Université Grenoble Alpes, LJK UMR 5224, Grenoble, France.

Alexandre Schortgen , Thibault Goyallon , Guillaume Saulière , Antoine Muller , Lionel Revéret

Markerless video analysis represents an opportunity for conducting efficient in-situ motion analysis of athletes during competitions. From monocular video data, we propose a robust end-to-end method to automatically capture the 2D trajectory of athletes' position on planar ground, even in highly occluded contexts. A tracking-by-detection algorithm is first applied on a short sequence to build a specific contextual dataset ('self-supervision').

View Article and Find Full Text PDF

Similar Publications

HLDD: Hierarchically Learned Detector and Descriptor for Robust Image Matching.

IEEE Trans Image Process

January 2025

Maoqing Hu , Bin Sun , Fuhua Zhang , Shutao Li

Image matching is a critical task in computer vision research, focusing on aligning two or more images with similar features. Feature detection and description constitute the core of image matching. Handcrafted detectors are capable of obtaining distinctive points but these points may not be repeatable on the image pairs especially those with dramatic appearance changes.

View Article and Find Full Text PDF

Similar Publications

Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations.

IEEE Trans Pattern Anal Mach Intell

September 2025

Shen Cai , Zhanhao Wu , Lingxi Guo , Jiachun Wang , Siyu Zhang

In this article, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal 4-point configuration, two similarity transformations in SKS are computed by two anchor points on source and target planes, respectively. Then, the other two point correspondences can be exploited to compute the middle kernel transformation with only four parameters.

View Article and Find Full Text PDF

Similar Publications

Geometry-Constrained Learning-Based Visual Servoing with Projective Homography-Derived Error Vector.

Sensors (Basel)

April 2025

Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.

Yueyuan Zhang , Arpan Ghosh , Yechan An , Kyeongjin Joo , SangMin Kim

We propose a novel geometry-constrained learning-based method for camera-in-hand visual servoing systems that eliminates the need for camera intrinsic parameters, depth information, and the robot's kinematic model. Our method uses a cerebellar model articulation controller (CMAC) to execute online Jacobian estimation within the control framework. Specifically, we introduce a fixed-dimension, uniform-magnitude error function based on the projective homography matrix.

View Article and Find Full Text PDF

Similar Publications