Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Many computer vision tasks, such as monocular depth estimation and height estimation from a satellite orthophoto, have a common underlying goal, which is regression of dense continuous values for the pixels given a single image. We define them as dense continuous-value regression (DCR) tasks. Recent approaches based on deep convolutional neural networks significantly improve the performance of DCR tasks, particularly on pixelwise regression accuracy. However, it still remains challenging to simultaneously preserve the global structure and fine object details in complex scenes. In this article, we take advantage of the efficiency of Laplacian pyramid on representing multiscale contents to reconstruct high-quality signals for complex scenes. We design a Laplacian pyramid neural network (LAPNet), which consists of a Laplacian pyramid decoder (LPD) for signal reconstruction and an adaptive dense feature fusion (ADFF) module to fuse features from the input image. More specifically, we build an LPD to effectively express both global and local scene structures. In our LPD, the upper and lower levels, respectively, represent scene layouts and shape details. We introduce a residual refinement module to progressively complement high-frequency details for signal prediction at each level. To recover the signals at each individual level in the pyramid, an ADFF module is proposed to adaptively fuse multiscale image features for accurate prediction. We conduct comprehensive experiments to evaluate a number of variants of our model on three important DCR tasks, i.e., monocular depth estimation, single-image height estimation, and density map estimation for crowd counting. Experiments demonstrate that our method achieves new state-of-the-art performance in both qualitative and quantitative evaluation on the NYU-D V2 and KITTI for monocular depth estimation, the challenging Urban Semantic 3D (US3D) for satellite height estimation, and four challenging benchmarks for crowd counting. These results demonstrate that the proposed LAPNet is a universal and effective architecture for DCR problems.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2020.3026669DOI Listing

Publication Analysis

Top Keywords

laplacian pyramid
16
complex scenes
12
monocular depth
12
depth estimation
12
height estimation
12
dcr tasks
12
pyramid neural
8
neural network
8
dense continuous-value
8
continuous-value regression
8

Similar Publications

Purpose: The relationship between mild neurovascular conflict (NVC) and trigeminal neuralgia (TN) remains ill-defined, especially as mild NVC is often seen in asymptomatic population without any facial pain. We aim to analyze the trigeminal nerve microstructure using artificial intelligence (AI) to distinguish symptomatic and asymptomatic nerves between idiopathic TN (iTN) and the asymptomatic control group with incidental grade‑1 NVC.

Methods: Seventy-eight symptomatic trigeminal nerves with grade-1 NVC in iTN patients, and an asymptomatic control group consisting of Bell's palsy patients free from facial pain (91 grade-1 NVC and 91 grade-0 NVC), were included in the study.

View Article and Find Full Text PDF

In the field of oncology imaging, the fusion of magnetic resonance imaging (MRI) and positron emission tomography (PET) modalities is crucial for enhancing diagnostic capabilities. This article introduces a novel fusion method that leverages the strengths of both modalities to overcome limitations associated with functional information in MRI and the spatial resolution in PET scans. Our approach integrates the Laplacian pyramid for extracting high and low-frequency components, along with empirical mode decomposition and phase congruency to preserve crucial structural details in the fused image.

View Article and Find Full Text PDF

Multi-Scale Fusion Underwater Image Enhancement Based on HSV Color Space Equalization.

Sensors (Basel)

April 2025

National Key Laboratory of Optical Field Manipulation Science and Technology, Chinese Academy of Sciences, Chengdu 610209, China.

Meeting the escalating demand for high-quality underwater imagery poses a significant challenge due to light absorption and scattering in water, resulting in color distortion and reduced contrast. This study presents an innovative approach for enhancing underwater images, combining color correction, HSV color space equalization, and multi-scale fusion techniques. Initially, automatic contrast adjustment and improved white balance corrected color bias; this was followed by saturation and value equalization in the HSV space to enhance brightness and saturation.

View Article and Find Full Text PDF

With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid.

View Article and Find Full Text PDF

Background: Positron Emission Tomography (PET) scans are a crucial tool in the diagnosing and monitoring of a number of complex conditions, including cancer, heart health, and especially cognitive brain function. However, they are also often much more expensive than comparable imaging modalities such as X-Ray and magnetic resonance imaging (MRI), which can limit their availability and the impact of their use in both medical and machine learning settings. We propose to address this problem by using generative models to simulate the PET scan results based on prior MRI.

View Article and Find Full Text PDF