English
Related papers

Related papers: Learning to Compose Dynamic Tree Structures for Vi…

200 papers

Understanding realistic visual scene images together with language descriptions is a fundamental task towards generic visual understanding. Previous works have shown compelling comprehensive results by building hierarchical structures for…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Chao Lou , Wenjuan Han , Yuhuan Lin , Zilong Zheng

Reasoning about images/objects and their hierarchical interactions is a key concept for the next generation of computer vision approaches. Here we present a new framework to deal with it through a visual hierarchical context-based…

Computer Vision and Pattern Recognition · Computer Science 2019-09-04 Pedro H. Bugatti , Priscila T. M. Saito , Larry S. Davis

A Concept Tree is a structure for storing knowledge where the trees are stored in a database called a Concept Base. It sits between the highly distributed neural architectures and the distributed information systems, with the intention of…

Artificial Intelligence · Computer Science 2020-04-07 Kieran Greer

State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction, such as relations between objects. In this work, we present ViStruct, a training framework to learn VLMs for effective visual…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Yangyi Chen , Xingyao Wang , Manling Li , Derek Hoiem , Heng Ji

Relations amongst entities play a central role in image understanding. Due to the complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Zih-Siou Hung , Arun Mallya , Svetlana Lazebnik

As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Danning Lao , Qi Liu , Jiazi Bu , Junchi Yan , Wei Shen

In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks and adapt to new tasks during test time without re-training. Unlike previous approaches, our focus is on training…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Simon Reiß , Zdravko Marinov , Alexander Jaus , Constantin Seibold , M. Saquib Sarfraz , Erik Rodner , Rainer Stiefelhagen

Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced visual scene understanding task with a wide range of applications, including visual question…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Xuejiao Tang , Xin Huang , Wenbin Zhang , Travers B. Child , Qiong Hu , Zhen Liu , Ji Zhang

While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an…

Computation and Language · Computer Science 2021-10-22 Adyasha Maharana , Mohit Bansal

The visual world around us can be described as a structured set of objects and their associated relations. An image of a room may be conjured given only the description of the underlying objects and their associated relations. While there…

Computer Vision and Pattern Recognition · Computer Science 2021-11-18 Nan Liu , Shuang Li , Yilun Du , Joshua B. Tenenbaum , Antonio Torralba

As the intermediate-level representations bridging the two levels, structured representations of visual scenes, such as visual relationships between pairwise objects, have been shown to not only benefit compositional models in learning to…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Meng-Jiun Chiou

Vision language models (VLMs) excel at zero-shot visual classification, but their performance on fine-grained tasks and large hierarchical label spaces is understudied. This paper investigates whether structured, tree-based reasoning can…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Sary Elmansoury , Islam Mesabah , Gerrit Großmann , Peter Neigel , Raj Bhalwankar , Daniel Kondermann , Sebastian J. Vollmer

The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages…

Robotics · Computer Science 2025-01-13 Naoki Wake , Atsushi Kanehira , Jun Takamatsu , Kazuhiro Sasabuchi , Katsushi Ikeuchi

Understanding a visual scene incorporates objects, relationships, and context. Traditional methods working on an image mostly focus on object detection and fail to capture the relationship between the objects. Relationships can give rich…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Himangi Mittal , Ajith Abraham , Anuja Arora

This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification. We hypothesise that the context around the element inside the web page is…

Machine Learning · Computer Science 2021-11-09 Cedric Cook

This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Xi Chen , Mingkang Zhu , Shaoteng Liu , Xiaoyang Wu , Xiaogang Xu , Yu Liu , Xiang Bai , Hengshuang Zhao

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a…

Computer Vision and Pattern Recognition · Computer Science 2019-01-21 Qiuyuan Huang , Zhe Gan , Asli Celikyilmaz , Dapeng Wu , Jianfeng Wang , Xiaodong He

Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual…

Computer Vision and Pattern Recognition · Computer Science 2020-03-27 Mengmi Zhang , Claire Tseng , Gabriel Kreiman

Objects in a scene are not always related. The execution efficiency of the one-stage scene graph generation approaches are quite high, which infer the effective relation between entity pairs using sparse proposal sets and a few queries.…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Yuxiang Zhang , Zhenbo Liu , Shuai Wang

Exposing latent knowledge in geospatial trajectories has the potential to provide a better understanding of the movements of individuals and groups. Motivated by such a desire, this work presents the context tree, a new hierarchical data…

Data Structures and Algorithms · Computer Science 2016-10-12 Alasdair Thomason , Nathan Griffiths , Victor Sanchez
‹ Prev 1 2 3 10 Next ›