Related papers: Grounding Predicates through Actions

Explainable Video Action Reasoning via Prior Knowledge and State Transitions

Human action analysis and understanding in videos is an important and challenging task. Although substantial progress has been made in past years, the explainability of existing methods is still limited. In this work, we propose a novel…

Computer Vision and Pattern Recognition · Computer Science 2019-08-29 Tao Zhuo , Zhiyong Cheng , Peng Zhang , Yongkang Wong , Mohan Kankanhalli

Predicate Invention for Bilevel Planning

Efficient planning in continuous state and action spaces is fundamentally hard, even when the transition model is deterministic and known. One way to alleviate this challenge is to perform bilevel planning with abstractions, where a…

Artificial Intelligence · Computer Science 2025-05-28 Tom Silver , Rohan Chitnis , Nishanth Kumar , Willie McClinton , Tomas Lozano-Perez , Leslie Pack Kaelbling , Joshua Tenenbaum

From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models

Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of short-horizon demonstrations containing sequences of images. To this end, we focus on learning abstract…

Robotics · Computer Science 2026-03-10 Ashay Athalye , Nishanth Kumar , Tom Silver , Yichao Liang , Jiuguang Wang , Tomás Lozano-Pérez , Leslie Pack Kaelbling

Grounding Symbols in Multi-Modal Instructions

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must…

Artificial Intelligence · Computer Science 2017-06-02 Yordan Hristov , Svetlin Penkov , Alex Lascarides , Subramanian Ramamoorthy

Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

Learning to infer labels in an open world, i.e., in an environment where the target ``labels'' are unknown, is an important characteristic for achieving autonomy. Foundation models, pre-trained on enormous amounts of data, have shown…

Computer Vision and Pattern Recognition · Computer Science 2024-05-06 Sanjoy Kundu , Shubham Trehan , Sathyanarayanan N. Aakur

Learning To Recognize Procedural Activities with Distant Supervision

In this paper we consider the problem of classifying fine-grained, multi-step activities (e.g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to…

Computer Vision and Pattern Recognition · Computer Science 2022-06-20 Xudong Lin , Fabio Petroni , Gedas Bertasius , Marcus Rohrbach , Shih-Fu Chang , Lorenzo Torresani

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Pascal Mettes , Cees G. M. Snoek , Shih-Fu Chang

Symbolic State Estimation with Predicates for Contact-Rich Manipulation Tasks

Manipulation tasks often require a robot to adjust its sensorimotor skills based on the state it finds itself in. Taking peg-in-hole as an example: once the peg is aligned with the hole, the robot should push the peg downwards. While high…

Robotics · Computer Science 2022-03-07 Toki Migimatsu , Wenzhao Lian , Jeannette Bohg , Stefan Schaal

GoalNet: Inferring Conjunctive Goal Predicates from Human Plan Demonstrations for Robot Instruction Following

Our goal is to enable a robot to learn how to sequence its actions to perform tasks specified as natural language instructions, given successful demonstrations from a human partner. The ability to plan high-level tasks can be factored as…

Robotics · Computer Science 2022-05-17 Shreya Sharma , Jigyasa Gupta , Shreshth Tuli , Rohan Paul , Mausam

Active Learning for Video Classification with Frame Level Queries

Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Debanjan Goswami , Shayok Chakraborty

Visual Semantic Role Labeling

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of…

Computer Vision and Pattern Recognition · Computer Science 2015-05-19 Saurabh Gupta , Jitendra Malik

Embodied Active Learning of Relational State Abstractions for Bilevel Planning

State abstraction is an effective technique for planning in robotics environments with continuous states and actions, long task horizons, and sparse feedback. In object-oriented environments, predicates are a particularly useful form of…

Robotics · Computer Science 2023-06-21 Amber Li , Tom Silver

Prediction and Description of Near-Future Activities in Video

Most of the existing works on human activity analysis focus on recognition or early recognition of the activity labels from complete or partial observations. Similarly, almost all of the existing video captioning approaches focus on the…

Computer Vision and Pattern Recognition · Computer Science 2021-05-28 Tahmida Mahmud , Mohammad Billah , Mahmudul Hasan , Amit K. Roy-Chowdhury

ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition

Learning to infer labels in an open world, i.e., in an environment where the target "labels" are unknown, is an important characteristic for achieving autonomy. Foundation models pre-trained on enormous amounts of data have shown remarkable…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Sanjoy Kundu , Shubham Trehan , Sathyanarayanan N. Aakur

Anticipating Visual Representations from Unlabeled Video

Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world…

Computer Vision and Pattern Recognition · Computer Science 2016-12-01 Carl Vondrick , Hamed Pirsiavash , Antonio Torralba

Action Recognition: From Static Datasets to Moving Robots

Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these…

Robotics · Computer Science 2017-09-20 Fahimeh Rezazadegan , Sareh Shirazi , Ben Upcroft , Michael Milford

Active Learning for Structured Prediction from Partially Labeled Data

We propose a general purpose active learning algorithm for structured prediction, gathering labeled data for training a model that outputs a set of related labels for an image or video. Active learning starts with a limited initial training…

Computer Vision and Pattern Recognition · Computer Science 2017-06-16 Mehran Khodabandeh , Zhiwei Deng , Mostafa S. Ibrahim , Shinichi Satoh , Greg Mori

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic…

Artificial Intelligence · Computer Science 2025-03-04 Yichao Liang , Nishanth Kumar , Hao Tang , Adrian Weller , Joshua B. Tenenbaum , Tom Silver , João F. Henriques , Kevin Ellis

Objects2action: Classifying and localizing actions without any video example

The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to…

Computer Vision and Pattern Recognition · Computer Science 2015-10-26 Mihir Jain , Jan C. van Gemert , Thomas Mensink , Cees G. M. Snoek

Joint Discovery of Object States and Manipulation Actions

Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically…

Computer Vision and Pattern Recognition · Computer Science 2017-08-29 Jean-Baptiste Alayrac , Josev Sivic , Ivan Laptev , Simon Lacoste-Julien