English
Related papers

Related papers: Learning to Compose Visual Relations

200 papers

Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by…

Machine Learning · Computer Science 2017-02-17 David Raposo , Adam Santoro , David Barrett , Razvan Pascanu , Timothy Lillicrap , Peter Battaglia

Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what…

Machine Learning · Computer Science 2024-05-21 Xianghao Kong , Ollie Liu , Han Li , Dani Yogatama , Greg Ver Steeg

Identifying objects in an image and their mutual relationships as a scene graph leads to a deep understanding of image content. Despite the recent advancement in deep learning, the detection and labeling of visual object relationships…

Computer Vision and Pattern Recognition · Computer Science 2021-07-13 Rajat Koner , Poulami Sinhamahapatra , Volker Tresp

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Jocelin Su , Nan Liu , Yanbo Wang , Joshua B. Tenenbaum , Yilun Du

As the intermediate-level representations bridging the two levels, structured representations of visual scenes, such as visual relationships between pairwise objects, have been shown to not only benefit compositional models in learning to…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Meng-Jiun Chiou

Humans are highly efficient learners, with the ability to grasp the meaning of a new concept from just a few examples. Unlike popular computer vision systems, humans can flexibly leverage the compositional structure of the visual world,…

Computer Vision and Pattern Recognition · Computer Science 2021-05-21 Yanli Zhou , Brenden M. Lake

Describing images with text is a fundamental problem in vision-language research. Current studies in this domain mostly focus on single image captioning. However, in various real applications (e.g., image editing, difference interpretation,…

Computation and Language · Computer Science 2019-06-20 Hao Tan , Franck Dernoncourt , Zhe Lin , Trung Bui , Mohit Bansal

Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Chuanxia Zheng , Duy-Son Dao , Guoxian Song , Tat-Jen Cham , Jianfei Cai

Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task.…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Bo Dai , Yuqi Zhang , Dahua Lin

In this work, we interpret the representations of multi-object scenes in vision encoders through the lens of structured representations. Structured representations allow modeling of individual objects distinctly and their flexible use based…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Tarun Khajuria , Braian Olmiro Dias , Marharyta Domnich , Jaan Aru

Objects and their relationships are critical contents for image understanding. A scene graph provides a structured description that captures these properties of an image. However, reasoning about the relationships between objects is very…

Computer Vision and Pattern Recognition · Computer Science 2018-11-16 Sanghyun Woo , Dahun Kim , Donghyeon Cho , In So Kweon

Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms…

Machine Learning · Computer Science 2020-12-16 Aleksandar Stanić , Sjoerd van Steenkiste , Jürgen Schmidhuber

Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the…

Computer Vision and Pattern Recognition · Computer Science 2017-10-23 Daniel E. Worrall , Stephan J. Garbin , Daniyar Turmukhambetov , Gabriel J. Brostow

Metric learning seeks to embed images of objects suchthat class-defined relations are captured by the embeddingspace. However, variability in images is not just due to different depicted object classes, but also depends on other latent…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Karsten Roth , Biagio Brattoli , Björn Ommer

Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.…

Computation and Language · Computer Science 2017-10-03 Stephanie Zhou , Alane Suhr , Yoav Artzi

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 T. Takada , W. Shimaya , Y. Ohmura , Y. Kuniyoshi

Interactions play a key role in understanding objects and scenes, for both virtual and real world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or…

Complex visual scenes that are composed of multiple objects, each with attributes, such as object name, location, pose, color, etc., are challenging to describe in order to train neural networks. Usually,deep learning networks are trained…

Neural and Evolutionary Computing · Computer Science 2023-03-27 E. Paxon Frady , Spencer Kent , Quinn Tran , Pentti Kanerva , Bruno A. Olshausen , Friedrich T. Sommer

Learning to fuse vision and language information and representing them is an important research problem with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in…

Computer Vision and Pattern Recognition · Computer Science 2020-10-08 Bowen Zhang , Hexiang Hu , Vihan Jain , Eugene Ie , Fei Sha

Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to…

Computer Vision and Pattern Recognition · Computer Science 2016-08-02 Cewu Lu , Ranjay Krishna , Michael Bernstein , Li Fei-Fei
‹ Prev 1 2 3 10 Next ›