Related papers: Learning to Compose Visual Relations

Discovering objects and their relations from entangled scene representations

Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by…

Machine Learning · Computer Science 2017-02-17 David Raposo , Adam Santoro , David Barrett , Razvan Pascanu , Timothy Lillicrap , Peter Battaglia

Interpretable Diffusion via Information Decomposition

Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what…

Machine Learning · Computer Science 2024-05-21 Xianghao Kong , Ollie Liu , Han Li , Dani Yogatama , Greg Ver Steeg

Scenes and Surroundings: Scene Graph Generation using Relation Transformer

Identifying objects in an image and their mutual relationships as a scene graph leads to a deep understanding of image content. Despite the recent advancement in deep learning, the detection and labeling of visual object relationships…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Rajat Koner , Poulami Sinhamahapatra , Volker Tresp

Compositional Image Decomposition with Diffusion Models

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Jocelin Su , Nan Liu , Yanbo Wang , Joshua B. Tenenbaum , Yilun Du

Learning Structured Representations of Visual Scenes

As the intermediate-level representations bridging the two levels, structured representations of visual scenes, such as visual relationships between pairwise objects, have been shown to not only benefit compositional models in learning to…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Meng-Jiun Chiou

Flexible Compositional Learning of Structured Visual Concepts

Humans are highly efficient learners, with the ability to grasp the meaning of a new concept from just a few examples. Unlike popular computer vision systems, humans can flexibly leverage the compositional structure of the visual world,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-21 Yanli Zhou , Brenden M. Lake

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language research. Current studies in this domain mostly focus on single image captioning. However, in various real applications (e.g., image editing, difference interpretation,…

Computation and Language · Computer Science 2019-06-20 Hao Tan , Franck Dernoncourt , Zhe Lin , Trung Bui , Mohit Bansal

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Chuanxia Zheng , Duy-Son Dao , Guoxian Song , Tat-Jen Cham , Jianfei Cai

Detecting Visual Relationships with Deep Relational Networks

Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Bo Dai , Yuqi Zhang , Dahua Lin

Interpreting the structure of multi-object representations in vision encoders

In this work, we interpret the representations of multi-object scenes in vision encoders through the lens of structured representations. Structured representations allow modeling of individual objects distinctly and their flexible use based…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Tarun Khajuria , Braian Olmiro Dias , Marharyta Domnich , Jaan Aru

LinkNet: Relational Embedding for Scene Graph

Objects and their relationships are critical contents for image understanding. A scene graph provides a structured description that captures these properties of an image. However, reasoning about the relationships between objects is very…

Computer Vision and Pattern Recognition · Computer Science 2018-11-16 Sanghyun Woo , Dahun Kim , Donghyeon Cho , In So Kweon

Hierarchical Relational Inference

Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms…

Machine Learning · Computer Science 2020-12-16 Aleksandar Stanić , Sjoerd van Steenkiste , Jürgen Schmidhuber

Interpretable Transformations with Encoder-Decoder Networks

Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the…

Computer Vision and Pattern Recognition · Computer Science 2017-10-23 Daniel E. Worrall , Stephan J. Garbin , Daniyar Turmukhambetov , Gabriel J. Brostow

MIC: Mining Interclass Characteristics for Improved Metric Learning

Metric learning seeks to embed images of objects suchthat class-defined relations are captured by the embeddingspace. However, variability in images is not just due to different depicted object classes, but also depends on other latent…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Karsten Roth , Biagio Brattoli , Björn Ommer

Visual Reasoning with Natural Language

Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.…

Computation and Language · Computer Science 2017-10-03 Stephanie Zhou , Alane Suhr , Yoav Artzi

Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 T. Takada , W. Shimaya , Y. Ohmura , Y. Kuniyoshi

Understanding and Exploiting Object Interaction Landscapes

Interactions play a key role in understanding objects and scenes, for both virtual and real world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or…

Graphics · Computer Science 2016-11-09 Sören Pirk , Vojtech Krs , Kaimo Hu , Suren Deepak Rajasekaran , Hao Kang , Bedrich Benes , Yusuke Yoshiyasu , Leonidas J. Guibas

Learning and generalization of compositional representations of visual scenes

Complex visual scenes that are composed of multiple objects, each with attributes, such as object name, location, pose, color, etc., are challenging to describe in order to train neural networks. Usually,deep learning networks are trained…

Neural and Evolutionary Computing · Computer Science 2023-03-27 E. Paxon Frady , Spencer Kent , Quinn Tran , Pentti Kanerva , Bruno A. Olshausen , Friedrich T. Sommer

Learning to Represent Image and Text with Denotation Graph

Learning to fuse vision and language information and representing them is an important research problem with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in…

Computer Vision and Pattern Recognition · Computer Science 2020-10-08 Bowen Zhang , Hexiang Hu , Vihan Jain , Eugene Ie , Fei Sha

Visual Relationship Detection with Language Priors

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to…

Computer Vision and Pattern Recognition · Computer Science 2016-08-02 Cewu Lu , Ranjay Krishna , Michael Bernstein , Li Fei-Fei