Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes.

Xuejin Chen , Xiaotian Chen , Yiteng Zhang , Xueyang Fu , Zheng-Jun Zha

IEEE Trans Neural Netw Learn Syst

Published: November 2021

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Many computer vision tasks, such as monocular depth estimation and height estimation from a satellite orthophoto, have a common underlying goal, which is regression of dense continuous values for the pixels given a single image. We define them as dense continuous-value regression (DCR) tasks. Recent approaches based on deep convolutional neural networks significantly improve the performance of DCR tasks, particularly on pixelwise regression accuracy. However, it still remains challenging to simultaneously preserve the global structure and fine object details in complex scenes. In this article, we take advantage of the efficiency of Laplacian pyramid on representing multiscale contents to reconstruct high-quality signals for complex scenes. We design a Laplacian pyramid neural network (LAPNet), which consists of a Laplacian pyramid decoder (LPD) for signal reconstruction and an adaptive dense feature fusion (ADFF) module to fuse features from the input image. More specifically, we build an LPD to effectively express both global and local scene structures. In our LPD, the upper and lower levels, respectively, represent scene layouts and shape details. We introduce a residual refinement module to progressively complement high-frequency details for signal prediction at each level. To recover the signals at each individual level in the pyramid, an ADFF module is proposed to adaptively fuse multiscale image features for accurate prediction. We conduct comprehensive experiments to evaluate a number of variants of our model on three important DCR tasks, i.e., monocular depth estimation, single-image height estimation, and density map estimation for crowd counting. Experiments demonstrate that our method achieves new state-of-the-art performance in both qualitative and quantitative evaluation on the NYU-D V2 and KITTI for monocular depth estimation, the challenging Urban Semantic 3D (US3D) for satellite height estimation, and four challenging benchmarks for crowd counting. These results demonstrate that the proposed LAPNet is a universal and effective architecture for DCR problems.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2020.3026669	DOI Listing

Publication Analysis

Top Keywords

laplacian pyramid

complex scenes

monocular depth

depth estimation

height estimation

dcr tasks

pyramid neural

neural network

dense continuous-value

continuous-value regression

Similar Publications

Distinguishing symptomatic and asymptomatic trigeminal nerves through radiomics and deep learning: A microstructural study in idiopathic TN patients and asymptomatic control group.

Neuroradiology

July 2025

Topkapi University, İstanbul, Turkey.

Ferhat Cüce , Gokalp Tulum , Ömer Karadaş , Muhammet İkbal Işik , Merve Dur İnce

Purpose: The relationship between mild neurovascular conflict (NVC) and trigeminal neuralgia (TN) remains ill-defined, especially as mild NVC is often seen in asymptomatic population without any facial pain. We aim to analyze the trigeminal nerve microstructure using artificial intelligence (AI) to distinguish symptomatic and asymptomatic nerves between idiopathic TN (iTN) and the asymptomatic control group with incidental grade‑1 NVC.

Methods: Seventy-eight symptomatic trigeminal nerves with grade-1 NVC in iTN patients, and an asymptomatic control group consisting of Bell's palsy patients free from facial pain (91 grade-1 NVC and 91 grade-0 NVC), were included in the study.

View Article and Find Full Text PDF

Similar Publications

Enhanced MRI-PET fusion using Laplacian pyramid and empirical mode decomposition for improved oncology imaging.

PLoS One

May 2025

Department of Mathematics, College of Science, Jazan University, Jazan, Kingdom of Saudi Arabia.

Gunnam Suryanarayana , Satyanarayana Murthy Nimmagadda , Sabbavarapu Nageswara Rao , Ali Mohammed Y Mahnashi , Shri Ramtej Kondamuri

In the field of oncology imaging, the fusion of magnetic resonance imaging (MRI) and positron emission tomography (PET) modalities is crucial for enhancing diagnostic capabilities. This article introduces a novel fusion method that leverages the strengths of both modalities to overcome limitations associated with functional information in MRI and the spatial resolution in PET scans. Our approach integrates the Laplacian pyramid for extracting high and low-frequency components, along with empirical mode decomposition and phase congruency to preserve crucial structural details in the fused image.

View Article and Find Full Text PDF

Similar Publications

Multi-Scale Fusion Underwater Image Enhancement Based on HSV Color Space Equalization.

Sensors (Basel)

April 2025

National Key Laboratory of Optical Field Manipulation Science and Technology, Chinese Academy of Sciences, Chengdu 610209, China.

Jialiang Zhang , Haibing Su , Tao Zhang , Hu Tian , Bin Fan

Meeting the escalating demand for high-quality underwater imagery poses a significant challenge due to light absorption and scattering in water, resulting in color distortion and reduced contrast. This study presents an innovative approach for enhancing underwater images, combining color correction, HSV color space equalization, and multi-scale fusion techniques. Initially, automatic contrast adjustment and improved white balance corrected color bias; this was followed by saturation and value equalization in the HSV space to enhance brightness and saturation.

View Article and Find Full Text PDF

Similar Publications

A Perceptually Optimized and Self-Calibrated Tone Mapping Operator.

IEEE Trans Vis Comput Graph

October 2025

Peibei Cao , Chenyang Le , Yuming Fang , Kede Ma

With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid.

View Article and Find Full Text PDF

Similar Publications

MRI2PET: Realistic PET Image Synthesis from MRI for Automated Inference of Brain Atrophy and Alzheimer's.

medRxiv

April 2025

Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA.

Brandon Theodorou , Anant Dadu , Brian Avants , Mike Nalls , Jimeng Sun

Background: Positron Emission Tomography (PET) scans are a crucial tool in the diagnosing and monitoring of a number of complex conditions, including cancer, heart health, and especially cognitive brain function. However, they are also often much more expensive than comparable imaging modalities such as X-Ray and magnetic resonance imaging (MRI), which can limit their availability and the impact of their use in both medical and machine learning settings. We propose to address this problem by using generative models to simulate the PET scan results based on prior MRI.

View Article and Find Full Text PDF

Similar Publications