98%
921
2 minutes
20
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder-decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD's diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer's global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication-addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349348 | PMC |
http://dx.doi.org/10.3390/s25154660 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
September 2025
In industrial scenarios, semantic segmentation of surface defects is vital for identifying, localizing, and delineating defects. However, new defect types constantly emerge with product iterations or process updates. Existing defect segmentation models lack incremental learning capabilities, and direct fine-tuning (FT) often leads to catastrophic forgetting.
View Article and Find Full Text PDFJ Korean Med Sci
September 2025
Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea.
Background: With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses.
View Article and Find Full Text PDFFront Plant Sci
September 2025
College of Big Data, Yunnan Agricultural University, Kunming, China.
Introduction: Accurate identification of cherry maturity and precise detection of harvestable cherry contours are essential for the development of cherry-picking robots. However, occlusion, lighting variation, and blurriness in natural orchard environments present significant challenges for real-time semantic segmentation.
Methods: To address these issues, we propose a machine vision approach based on the PIDNet real-time semantic segmentation framework.
Med Biol Eng Comput
September 2025
Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, 300072, China.
Surgical instrument segmentation plays an important role in robotic autonomous surgical navigation systems as it can accurately locate surgical instruments and estimate their posture, which helps surgeons understand the position and orientation of the instruments. However, there are still some problems affecting segmentation accuracy, like insufficient attention to the edges and center of surgical instruments, insufficient usage of low-level feature details, etc. To address these issues, a lightweight network for surgical instrument segmentation in gastrointestinal (GI) endoscopy (GESur_Net) is proposed.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2025
Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical visual grounding paradigm by accommodating multi-target and non-target scenarios. Specifically, GREC focuses on accurately identifying all referential objects at the coarse bounding box level, while GRES aims for achieve fine-grained pixel-level perception. However, existing approaches typically treat these tasks independently, overlooking the benefits of jointly training GREC and GRES to ensure consistent multi-granularity predictions and streamline the overall process.
View Article and Find Full Text PDF