Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

This paper introduces an explanatory graph representation to reveal object parts encoded inside convolutional layers of a CNN. Given a pre-trained CNN, each filter in a conv-layer usually represents a mixture of object parts. We develop a simple yet effective method to learn an explanatory graph, which automatically disentangles object parts from each filter without any part annotations. Specifically, given the feature map of a filter, we mine neural activations from the feature map, which correspond to different object parts. The explanatory graph is constructed to organize each mined part as a graph node. Each edge connects two nodes, whose corresponding object parts usually co-activate and keep a stable spatial relationship. Experiments show that each graph node consistently represented the same object part through different images, which boosted the transferability of CNN features. The explanatory graph transferred features of object parts to the task of part localization, and our method significantly outperformed other approaches.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2020.2992207DOI Listing

Publication Analysis

Top Keywords

object parts
24
explanatory graph
20
feature map
8
graph node
8
graph
7
object
7
parts
6
extraction explanatory
4
graph interpret
4
cnn
4

Similar Publications

Assembly theory (AT) quantifies selection using the assembly equation, identifying complex objects through the assembly index, the minimal steps required to build an object from basic parts, and copy number, the observed instances of the object. These measure a quantity called Assembly, capturing causation necessary to produce abundant objects, distinguishing selection-driven complexity from random generation. Unlike computational complexity theory, which often emphasizes minimal description length via compressibility, AT explicitly focuses on the causation captured by selection as the mechanism behind complexity.

View Article and Find Full Text PDF

Leveraging the ability of Vision Transformers (ViTs) to model contextual information across spatial patches, Masked Image Modeling (MIM) has emerged as a successful pre-training paradigm for visual representation learning by masking parts of the input and reconstructing the original image. However, this characteristic of ViTs has led many existing MIM methods to focus primarily on spatial patch reconstruction, overlooking the importance of semantic continuity in the channel dimension. Therefore, we propose a novel Masked Channel Modeling (MCM) pre-training paradigm, which reconstructs masked channel features using the contextual information from unmasked channels, thereby enhancing the model's understanding of images from the perspective of channel semantic continuity.

View Article and Find Full Text PDF

JA Signaling Inhibitor JAZ Is Involved in Regulation of AM Symbiosis with Cassava, Including Symbiosis Establishment and Cassava Growth.

J Fungi (Basel)

August 2025

State Key Laboratory of Tropical Crop Breeding, Sanya Institute of Breeding and Multiplication, Hainan University, Sanya 570025, China.

Mutualism between plants and arbuscular mycorrhizal fungi (AMF) is imperative for sustainable agricultural production. Jasmonic acid (JA) signal transduction has been demonstrated to play an important role in AMF symbiosis with the host. In this study, SC9 cassava was selected as the research object to investigate the effect of the jasmonic acid signaling pathway on symbiosis establishment and cassava growth in AMF and cassava symbiosis.

View Article and Find Full Text PDF

Grasping objects with a high degree of anthropomorphism is a critical component in the field of highly anthropomorphic robotic grasping. However, the accuracy of contact maps and the irrationality of the grasping gesture become challenges for grasp generation. In this paper, we propose a reasonably improved generation scheme, called Diffangle-Grasp, consisting of two parts: contact map generation based on a conditional variational autoencoder (CVAE), sharing the potential space with the diffusion model, and optimized grasping generation, conforming to the physical laws and the natural pose.

View Article and Find Full Text PDF

Visual category-selective representations in human ventral occipital temporal cortex (VOTC) seem to emerge early in infancy. Surprisingly, the VOTC of congenitally blind humans features category-selectivity for auditory and haptic objects. Yet it has been unknown whether VOTC would show category-selective visual responses if sight were restored in congenitally blind humans.

View Article and Find Full Text PDF