Related papers: Learning to Compose Dynamic Tree Structures for Vi…

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

Understanding realistic visual scene images together with language descriptions is a fundamental task towards generic visual understanding. Previous works have shown compelling comprehensive results by building hierarchical structures for…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Chao Lou , Wenjuan Han , Yuhuan Lin , Zilong Zheng

HiCoRe: Visual Hierarchical Context-Reasoning

Reasoning about images/objects and their hierarchical interactions is a key concept for the next generation of computer vision approaches. Here we present a new framework to deal with it through a visual hierarchical context-based…

Computer Vision and Pattern Recognition · Computer Science 2019-09-04 Pedro H. Bugatti , Priscila T. M. Saito , Larry S. Davis

Adding Context to Concept Trees

A Concept Tree is a structure for storing knowledge where the trees are stored in a database called a Concept Base. It sits between the highly distributed neural architectures and the distributed information systems, with the intention of…

Artificial Intelligence · Computer Science 2020-04-07 Kieran Greer

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation

State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction, such as relations between objects. In this work, we present ViStruct, a training framework to learn VLMs for effective visual…

Computer Vision and Pattern Recognition · Computer Science 2023-11-23 Yangyi Chen , Xingyao Wang , Manling Li , Derek Hoiem , Heng Ji

Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Relations amongst entities play a central role in image understanding. Due to the complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also…

Computer Vision and Pattern Recognition · Computer Science 2020-11-19 Zih-Siou Hung , Arun Mallya , Svetlana Lazebnik

ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization

As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Danning Lao , Qi Liu , Jiazi Bu , Junchi Yan , Wei Shen

Is Visual in-Context Learning for Compositional Medical Tasks within Reach?

In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks and adapt to new tasks during test time without re-training. Unlike previous approaches, our focus is on training…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Simon Reiß , Zdravko Marinov , Alexander Jaus , Constantin Seibold , M. Saquib Sarfraz , Erik Rodner , Rainer Stiefelhagen

Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory

Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced visual scene understanding task with a wide range of applications, including visual question…

Computer Vision and Pattern Recognition · Computer Science 2023-12-11 Xuejiao Tang , Xin Huang , Wenbin Zhang , Travers B. Child , Qiong Hu , Zhen Liu , Ji Zhang

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

While much research has been done in text-to-image synthesis, little work has been done to explore the usage of linguistic structure of the input text. Such information is even more important for story visualization since its inputs have an…

Computation and Language · Computer Science 2021-10-22 Adyasha Maharana , Mohit Bansal

Learning to Compose Visual Relations

The visual world around us can be described as a structured set of objects and their associated relations. An image of a room may be conjured given only the description of the underlying objects and their associated relations. While there…

Computer Vision and Pattern Recognition · Computer Science 2021-11-18 Nan Liu , Shuang Li , Yilun Du , Joshua B. Tenenbaum , Antonio Torralba

Learning Structured Representations of Visual Scenes

As the intermediate-level representations bridging the two levels, structured representations of visual scenes, such as visual relationships between pairwise objects, have been shown to not only benefit compositional models in learning to…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Meng-Jiun Chiou

Decomposing Visual Classification: Assessing Tree-Based Reasoning in VLMs

Vision language models (VLMs) excel at zero-shot visual classification, but their performance on fine-grained tasks and large hierarchical label spaces is understudied. This paper investigates whether structured, tree-based reasoning can…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Sary Elmansoury , Islam Mesabah , Gerrit Großmann , Peter Neigel , Raj Bhalwankar , Daniel Kondermann , Sebastian J. Vollmer

VLM-driven Behavior Tree for Context-aware Task Planning

The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages…

Robotics · Computer Science 2025-01-13 Naoki Wake , Atsushi Kanehira , Jun Takamatsu , Kazuhiro Sasabuchi , Katsushi Ikeuchi

Interpreting Context of Images using Scene Graphs

Understanding a visual scene incorporates objects, relationships, and context. Traditional methods working on an image mostly focus on object detection and fail to capture the relationship between the objects. Relationships can give rich…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Himangi Mittal , Ajith Abraham , Anuja Arora

Learning Context-Aware Representations of Subtrees

This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification. We hypothesise that the context around the element inside the web page is…

Machine Learning · Computer Science 2021-11-09 Cedric Cook

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Xi Chen , Mingkang Zhu , Shaoteng Liu , Xiaoyang Wu , Xiaogang Xu , Yu Liu , Xiang Bai , Hengshuang Zhao

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a…

Computer Vision and Pattern Recognition · Computer Science 2019-01-21 Qiuyuan Huang , Zhe Gan , Asli Celikyilmaz , Dapeng Wu , Jianfeng Wang , Xiaodong He

Putting visual object recognition in context

Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual…

Computer Vision and Pattern Recognition · Computer Science 2020-03-27 Mengmi Zhang , Claire Tseng , Gabriel Kreiman

SrTR: Self-reasoning Transformer with Visual-linguistic Knowledge for Scene Graph Generation

Objects in a scene are not always related. The execution efficiency of the one-stage scene graph generation approaches are quite high, which infer the effective relation between entity pairs using sparse proposal sets and a few queries.…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Yuxiang Zhang , Zhenbo Liu , Shuai Wang

Context Trees: Augmenting Geospatial Trajectories with Context

Exposing latent knowledge in geospatial trajectories has the potential to provide a better understanding of the movements of individuals and groups. Motivated by such a desire, this work presents the context tree, a new hierarchical data…

Data Structures and Algorithms · Computer Science 2016-10-12 Alasdair Thomason , Nathan Griffiths , Victor Sanchez