Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches, which are devised to confront the dilution of semantic context, loss of spatial structure and absence of boundary detail, respectively. Along with the fusion of three branches, the coarse segmentation results are gradually refined in structure details and boundary quality. Without adding additional learnable parameters, we further propose Scale-Adaptive Pooling Module to obtain multi-scale receptive field. In particular, on the premise of inheriting this framework, we rethink the relationship among accuracy, parameters and speed via network depth-width tradeoff. With these insightful considerations, we comprehensively design shallower and narrower models to explore the maximum potential of lightweight SOD. Our models are proposed for different application environments: 1) a tiny version CTD-S (1.7M, 125FPS) for resource constrained devices, 2) a fast version CTD-M (12.6M, 158FPS) for speed-demanding scenarios, 3) a standard version CTD-L (26.5M, 84FPS) for high-performance platforms. Extensive experiments validate the superiority of our method, which achieves better efficiency-accuracy balance across five benchmarks.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2023.3318959DOI Listing

Publication Analysis

Top Keywords

salient object
8
object detection
8
network depth-width
8
depth-width tradeoff
8
rethinking lightweight
4
lightweight salient
4
detection network
4
tradeoff existing
4
existing salient
4
detection methods
4

Similar Publications

The impact of scene inversion on early scene-selective activity.

Biol Psychol

September 2025

Department of Psychology, Wright State University, Dayton OH. Electronic address:

Category-selectivity is a ubiquitous property of high-level visual cortex manifested in distinct cortical responses to faces, objects, and scenes. These signatures emerge early during visual processing, with each category sensitive to specific types of visual information at different time points. However, it is still not clear what information is extracted during early scene-selective processing, as scenes are rich, complex, and multidimensional stimuli.

View Article and Find Full Text PDF

Computational saliency map models have facilitated quantitative investigations into how bottom-up visual salience influences attention. Two primary approaches to modeling salience computation exist: one focuses on functional approximation, while the other explores neurobiological implementation. The former provides sufficient performance for applying saliency map models to eye-movement data analysis, whereas the latter offers hypotheses on how neuronal abnormalities affect visual salience.

View Article and Find Full Text PDF

When planning reach-to-grasp movements, individuals frequently face a tradeoff between biomechanical comfort (i.e., avoiding effortful actions) and "socio-emotional comfort" (i.

View Article and Find Full Text PDF

In blind individuals, language processing activates not only classic language networks, but also the "visual" cortex. What is represented in visual areas when blind individuals process language? Here, we show that area V5/MT in blind individuals, but not other visual areas, responds differently to spoken nouns and verbs. We further show that this effect is present for concrete nouns and verbs, but not abstract or pseudo nouns and verbs.

View Article and Find Full Text PDF

Due to the limited output categories, semi-supervised salient object detection faces challenges in adapting conventional semi-supervised strategies. To address this limitation, we propose a multi-branch architecture that extracts complementary features from labeled data. Specifically, we introduce TripleNet, a three-branch network architecture designed for contour, content, and holistic saliency prediction.

View Article and Find Full Text PDF