98%
921
2 minutes
20
Many computer vision tasks, such as monocular depth estimation and height estimation from a satellite orthophoto, have a common underlying goal, which is regression of dense continuous values for the pixels given a single image. We define them as dense continuous-value regression (DCR) tasks. Recent approaches based on deep convolutional neural networks significantly improve the performance of DCR tasks, particularly on pixelwise regression accuracy. However, it still remains challenging to simultaneously preserve the global structure and fine object details in complex scenes. In this article, we take advantage of the efficiency of Laplacian pyramid on representing multiscale contents to reconstruct high-quality signals for complex scenes. We design a Laplacian pyramid neural network (LAPNet), which consists of a Laplacian pyramid decoder (LPD) for signal reconstruction and an adaptive dense feature fusion (ADFF) module to fuse features from the input image. More specifically, we build an LPD to effectively express both global and local scene structures. In our LPD, the upper and lower levels, respectively, represent scene layouts and shape details. We introduce a residual refinement module to progressively complement high-frequency details for signal prediction at each level. To recover the signals at each individual level in the pyramid, an ADFF module is proposed to adaptively fuse multiscale image features for accurate prediction. We conduct comprehensive experiments to evaluate a number of variants of our model on three important DCR tasks, i.e., monocular depth estimation, single-image height estimation, and density map estimation for crowd counting. Experiments demonstrate that our method achieves new state-of-the-art performance in both qualitative and quantitative evaluation on the NYU-D V2 and KITTI for monocular depth estimation, the challenging Urban Semantic 3D (US3D) for satellite height estimation, and four challenging benchmarks for crowd counting. These results demonstrate that the proposed LAPNet is a universal and effective architecture for DCR problems.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2020.3026669 | DOI Listing |
Purpose: The relationship between mild neurovascular conflict (NVC) and trigeminal neuralgia (TN) remains ill-defined, especially as mild NVC is often seen in asymptomatic population without any facial pain. We aim to analyze the trigeminal nerve microstructure using artificial intelligence (AI) to distinguish symptomatic and asymptomatic nerves between idiopathic TN (iTN) and the asymptomatic control group with incidental grade‑1 NVC.
Methods: Seventy-eight symptomatic trigeminal nerves with grade-1 NVC in iTN patients, and an asymptomatic control group consisting of Bell's palsy patients free from facial pain (91 grade-1 NVC and 91 grade-0 NVC), were included in the study.
PLoS One
May 2025
Department of Mathematics, College of Science, Jazan University, Jazan, Kingdom of Saudi Arabia.
In the field of oncology imaging, the fusion of magnetic resonance imaging (MRI) and positron emission tomography (PET) modalities is crucial for enhancing diagnostic capabilities. This article introduces a novel fusion method that leverages the strengths of both modalities to overcome limitations associated with functional information in MRI and the spatial resolution in PET scans. Our approach integrates the Laplacian pyramid for extracting high and low-frequency components, along with empirical mode decomposition and phase congruency to preserve crucial structural details in the fused image.
View Article and Find Full Text PDFSensors (Basel)
April 2025
National Key Laboratory of Optical Field Manipulation Science and Technology, Chinese Academy of Sciences, Chengdu 610209, China.
Meeting the escalating demand for high-quality underwater imagery poses a significant challenge due to light absorption and scattering in water, resulting in color distortion and reduced contrast. This study presents an innovative approach for enhancing underwater images, combining color correction, HSV color space equalization, and multi-scale fusion techniques. Initially, automatic contrast adjustment and improved white balance corrected color bias; this was followed by saturation and value equalization in the HSV space to enhance brightness and saturation.
View Article and Find Full Text PDFIEEE Trans Vis Comput Graph
October 2025
With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid.
View Article and Find Full Text PDFmedRxiv
April 2025
Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD, USA.
Background: Positron Emission Tomography (PET) scans are a crucial tool in the diagnosing and monitoring of a number of complex conditions, including cancer, heart health, and especially cognitive brain function. However, they are also often much more expensive than comparable imaging modalities such as X-Ray and magnetic resonance imaging (MRI), which can limit their availability and the impact of their use in both medical and machine learning settings. We propose to address this problem by using generative models to simulate the PET scan results based on prior MRI.
View Article and Find Full Text PDF