Visual reasoning in object-centric deep neural networks: A comparative cognition approach.

Guillermo Puebla , Jeffrey S Bowers

Neural Netw

School of Psychological Science, University of Bristol, 12a Priory Road, Bristol BS8 1TU, UK.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Achieving visual reasoning is a long-term goal of artificial intelligence. In the last decade, several studies have applied deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of generalization of the relations learned. However, in recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning within the deep learning framework. Object-centric models attempt to model input scenes as compositions of objects and relations between them. To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects. In this work we tested relation learning and generalization in several object-centric models, as well as a ResNet-50 baseline. In contrast to previous research, which has focused heavily in the same-different task in order to asses relational reasoning in DNNs, we use a set of tasks - with varying degrees of complexity - derived from the comparative cognition literature. Our results show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases. In our simpler tasks, this improves their capacity to learn and generalize visual relations in comparison to the ResNet-50 baseline. However, object-centric models still struggle in our more difficult tasks and conditions. We conclude that abstract visual reasoning remains an open challenge for DNNs, including object-centric models.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2025.107582	DOI Listing

Publication Analysis

Top Keywords

object-centric models

visual reasoning

deep neural

neural networks

comparative cognition

visual relations

objects scene

resnet-50 baseline

object-centric

visual

Similar Publications

TransGI: Real-Time Dynamic Global Illumination with Object-Centric Neural Transfer Model.

IEEE Trans Vis Comput Graph

August 2025

Yijie Deng , Lei Han , Lu Fang

Neural rendering algorithms have revolutionized computer graphics, yet their impact on real-time rendering under arbitrary lighting conditions remains limited due to strict latency constraints in practical applications. The key challenge lies in formulating a compact yet expressive material representation. To address this, we propose TransGI, a novel neural rendering method for real-time, high-fidelity global illumination.

View Article and Find Full Text PDF

Similar Publications

Identifying the dynamics of interacting objects with applications to scene understanding and video temporal manipulation.

IFAC Pap OnLine

September 2024

ECE Dept., Northeastern University, Boston, MA 02115 USA.

Armand Comas , Christian Fernandez , Sandesh Ghimire , Haolin Li , Octavia Camps

There is an ongoing effort in the machine learning community to enable machines to understand the world symbolically, facilitating human interaction with learned representations of complex scenes. A pre-requisite to achieving this is the ability to identify the dynamics of interacting objects from time traces of relevant features. In this paper, we introduce GrODID (GRaph-based Object-Centric Dynamic Mode Decomposition), a framework based on graph neural networks that enables Dynamic Mode Decomposition for systems involving interacting objects.

View Article and Find Full Text PDF

Similar Publications

UrbanGen: Urban Generation with Compositional and Controllable Neural Fields.

IEEE Trans Pattern Anal Mach Intell

August 2025

Yuanbo Yang , Yujun Shen , Yue Wang , Andreas Geiger , Yiyi Liao

Despite the rapid progress in generative radiance fields, most existing methods focus on object-centric applications and are not able to generate complex urban scenes. In this paper, we propose UrbanGen, a solution for the challenging task of generating urban radiance fields with photorealistic rendering, accurate geometry, high controllability, and diverse city styles. Our key idea is to leverage a coarse 3D panoptic prior, represented by a semantic voxel grid for stuff and bounding boxes for countable objects, to condition a compositional generative radiance field.

View Article and Find Full Text PDF

Similar Publications

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology.

Sensors (Basel)

July 2025

Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.

Jun-Hyeon Choi , Jeong-Won Pyo , Ye-Chan An , Tae-Yong Kuc

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning.

View Article and Find Full Text PDF

Similar Publications

Compositional Physical Reasoning of Objects and Events From Videos.

IEEE Trans Pattern Anal Mach Intell

September 2025

Zhenfang Chen , Shilong Dong , Kexin Yi , Yunzhu Li , Mingyu Ding

Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties.

View Article and Find Full Text PDF

Similar Publications