Guided Filter Network for Semantic Image Segmentation.

Xiang Zhang , Wanqing Zhao , Wei Zhang , Jinye Peng , Jianping Fan

IEEE Trans Image Process

Published: March 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The existing publicly available datasets with pixel-level labels contain limited categories, and it is difficult to generalize to the real world containing thousands of categories. In this paper, we propose an approach to generate object masks with detailed pixel-level structures/boundaries automatically to enable semantic image segmentation of thousands of targets in the real world without manually labelling. A Guided Filter Network (GFN) is first developed to learn the segmentation knowledge from an existed dataset, and such GFN then transfers the learned segmentation knowledge to generate initial coarse object masks for the target images. These coarse object masks are treated as pseudo labels to self-optimize the GFN iteratively in the target images. Our experiments on six image sets have demonstrated that our proposed approach can generate object masks with detailed pixel-level structures/boundaries, whose quality is comparable to the manually-labelled ones. Our proposed approach also achieves better performance on semantic image segmentation than most existing weakly-supervised, semi-supervised, and domain adaptation approaches under the same experimental conditions.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2022.3160399	DOI Listing

Publication Analysis

Top Keywords

object masks

semantic image

image segmentation

guided filter

filter network

segmentation existing

approach generate

generate object

masks detailed

detailed pixel-level

Similar Publications

Improving Generalized Visual Grounding with Instance-aware Joint Learning.

IEEE Trans Pattern Anal Mach Intell

September 2025

Ming Dai , Wenxuan Cheng , Jiang-Jiang Liu , Lingfeng Yang , Zhenhua Feng

Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical visual grounding paradigm by accommodating multi-target and non-target scenarios. Specifically, GREC focuses on accurately identifying all referential objects at the coarse bounding box level, while GRES aims for achieve fine-grained pixel-level perception. However, existing approaches typically treat these tasks independently, overlooking the benefits of jointly training GREC and GRES to ensure consistent multi-granularity predictions and streamline the overall process.

View Article and Find Full Text PDF

Similar Publications

Intelligent Detection and Description of Foreign Object Debris on Airport Pavements via Enhanced YOLOv7 and GPT-Based Prompt Engineering.

Sensors (Basel)

August 2025

School of Transportation, Southeast University, Nanjing 211189, China.

Hanglin Cheng , Ruoxi Zhang , Ruiheng Zhang , Yihao Li , Yang Lei

Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation module, and a structured prompt engineering mechanism to generate detailed semantic descriptions of detected FOD. Detection performance is improved through the integration of Coordinate Attention, Spatial-Depth Conversion (SPD-Conv), and a Gaussian Similarity IoU (GSIoU) loss, leading to a 3.

View Article and Find Full Text PDF

Similar Publications

A Motion Segmentation Dynamic SLAM for Indoor GNSS-Denied Environments.

Sensors (Basel)

August 2025

College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, China.

Yunhao Wu , Ziyao Zhang , Haifeng Chen , Jian Li

In GNSS-deprived settings, such as indoor and underground environments, research on simultaneous localization and mapping (SLAM) technology remains a focal point. Addressing the influence of dynamic variables on positional precision and constructing a persistent map comprising solely static elements are pivotal objectives in visual SLAM for dynamic scenes. This paper introduces optical flow motion segmentation-based SLAM(OS-SLAM), a dynamic environment SLAM system that incorporates optical flow motion segmentation for enhanced robustness.

View Article and Find Full Text PDF

Similar Publications

Context-Guided SAR Ship Detection with Prototype-Based Model Pretraining and Check-Balance-Based Decision Fusion.

Sensors (Basel)

August 2025

College of Electronics and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.

Haowen Zhou , Zhe Geng , Minjie Sun , Linyi Wu , He Yan

To address the challenging problem of multi-scale inshore-offshore ship detection in synthetic aperture radar (SAR) remote sensing images, we propose a novel deep learning-based automatic ship detection method within the framework of compositional learning. The proposed method is supported by three pillars: context-guided region proposal, prototype-based model-pretraining, and multi-model ensemble learning. To reduce the false alarms induced by the discrete ground clutters, the prior knowledge of the harbour's layout is exploited to generate land masks for terrain delimitation.

View Article and Find Full Text PDF

Similar Publications

Masked Channel Modeling Enables Vision Transformers to Learn Better Semantics.

Entropy (Basel)

July 2025

School of Artificial Intelligence, Xidian University, Xi'an 710071, China.

Jiayi Chen , Yanbiao Ma , Wei Dai , Zhihao Li

Leveraging the ability of Vision Transformers (ViTs) to model contextual information across spatial patches, Masked Image Modeling (MIM) has emerged as a successful pre-training paradigm for visual representation learning by masking parts of the input and reconstructing the original image. However, this characteristic of ViTs has led many existing MIM methods to focus primarily on spatial patch reconstruction, overlooking the importance of semantic continuity in the channel dimension. Therefore, we propose a novel Masked Channel Modeling (MCM) pre-training paradigm, which reconstructs masked channel features using the contextual information from unmasked channels, thereby enhancing the model's understanding of images from the perspective of channel semantic continuity.

View Article and Find Full Text PDF

Similar Publications