Related papers: Program Generation from Diverse Video Demonstratio…

From explanation to synthesis: Compositional program induction for learning from demonstration

Hybrid systems are a compact and natural mechanism with which to address problems in robotics. This work introduces an approach to learning hybrid systems from demonstrations, with an emphasis on extracting models that are explicitly…

Robotics · Computer Science 2019-09-12 Michael Burke , Svetlin Penkov , Subramanian Ramamoorthy

Enhancing Robot Program Synthesis Through Environmental Context

Program synthesis aims to automatically generate an executable program that conforms to the given specification. Recent advancements have demonstrated that deep neural methodologies and large-scale pretrained language models are highly…

Robotics · Computer Science 2023-12-14 Tianyi Chen , Qidi Wang , Zhen Dong , Liwei Shen , Xin Peng

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator.…

Machine Learning · Computer Science 2018-06-20 YuXuan Liu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

SummDiff: Generative Modeling of Video Summarization with Diffusion

Video summarization is a task of shortening a video by choosing a subset of frames while preserving its essential moments. Despite the innate subjectivity of the task, previous works have deterministically regressed to an averaged frame…

Machine Learning · Computer Science 2025-10-10 Kwanseok Kim , Jaehoon Hahm , Sumin Kim , Jinhwan Sul , Byunghak Kim , Joonseok Lee

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

Neural models excel at extracting statistical patterns from large amounts of data, but struggle to learn patterns or reason about language from only a few examples. In this paper, we ask: Can we learn explicit rules that generalize well…

Computation and Language · Computer Science 2021-06-15 Saujas Vaduguru , Aalok Sathe , Monojit Choudhury , Dipti Misra Sharma

Human Demonstrations are Generalizable Knowledge for Robots

Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which…

Robotics · Computer Science 2025-07-18 Te Cui , Tianxing Zhou , Zicai Peng , Mengxiao Hu , Haoyang Lu , Haizhou Li , Guangyan Chen , Meiling Wang , Yufeng Yue

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

Identifying common patterns among events is a key ability in human and machine perception, as it underlies intelligent decision making. We propose an approach for learning semantic relational set abstractions on videos, inspired by human…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Alex Andonian , Camilo Fosco , Mathew Monfort , Allen Lee , Rogerio Feris , Carl Vondrick , Aude Oliva

Generalizing Robot Trajectories from Single-Context Human Demonstrations: A Probabilistic Approach

Generalizing robot trajectories from human demonstrations to new contexts remains a key challenge in Learning from Demonstration (LfD), particularly when only single-context demonstrations are available. We present a novel Gaussian Mixture…

Robotics · Computer Science 2025-11-10 Qian Ying Lee , Suhas Raghavendra Kulkarni , Kenzhi Iskandar Wong , Lin Yang , Bernardo Noronha , Yongjun Wee , Domenico Campolo

Diverse Demonstrations Improve In-context Compositional Generalization

In-context learning has shown great success in i.i.d semantic parsing splits, where the training and test sets are drawn from the same distribution. In this setup, models are typically prompted with demonstrations that are similar to the…

Computation and Language · Computer Science 2023-06-27 Itay Levy , Ben Bogin , Jonathan Berant

Invariance Co-training for Robot Visual Generalization

Reasoning from diverse observations is a fundamental capability for generalist robot policies to operate in a wide range of environments. Despite recent advancements, many large-scale robotic policies still remain sensitive to key sources…

Robotics · Computer Science 2025-12-08 Jonathan Yang , Chelsea Finn , Dorsa Sadigh

Video Generators are Robot Policies

Despite tremendous progress in dexterous manipulation, current visuomotor policies remain fundamentally limited by two challenges: they struggle to generalize under perceptual or behavioral distribution shifts, and their performance is…

Robotics · Computer Science 2025-08-04 Junbang Liang , Pavel Tokmakov , Ruoshi Liu , Sruthi Sudhakar , Paarth Shah , Rares Ambrus , Carl Vondrick

GSum: A General Framework for Guided Neural Abstractive Summarization

Neural abstractive summarization models are flexible and can produce coherent summaries, but they are sometimes unfaithful and can be difficult to control. While previous studies attempt to provide different types of guidance to control the…

Computation and Language · Computer Science 2021-04-20 Zi-Yi Dou , Pengfei Liu , Hiroaki Hayashi , Zhengbao Jiang , Graham Neubig

Understandable Controller Extraction from Video Observations of Swarms

Swarm behavior emerges from the local interaction of agents and their environment often encoded as simple rules. Extracting the rules by watching a video of the overall swarm behavior could help us study and control swarm behavior in…

Robotics · Computer Science 2022-09-05 Khulud Alharthi , Zahraa S Abdallah , Sabine Hauert

Video Summarization using Denoising Diffusion Probabilistic Model

Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Zirui Shang , Yubo Zhu , Hongxi Li , Shuo Yang , Xinxiao Wu

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the…

Computation and Language · Computer Science 2023-05-09 Zenan Xu , Xiaojun Meng , Yasheng Wang , Qinliang Su , Zexuan Qiu , Xin Jiang , Qun Liu

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale…

Robotics · Computer Science 2024-06-25 Junbang Liang , Ruoshi Liu , Ege Ozguroglu , Sruthi Sudhakar , Achal Dave , Pavel Tokmakov , Shuran Song , Carl Vondrick

A Unified Multi-Faceted Video Summarization System

This paper addresses automatic summarization and search in visual data comprising of videos, live streams and image collections in a unified manner. In particular, we propose a framework for multi-faceted summarization which extracts…

Computer Vision and Pattern Recognition · Computer Science 2017-04-06 Anurag Sahoo , Vishal Kaushal , Khoshrav Doctor , Suyash Shetty , Rishabh Iyer , Ganesh Ramakrishnan

ViMo: Generating Motions from Casual Videos

Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Liangdong Qiu , Chengxing Yu , Yanran Li , Zhao Wang , Haibin Huang , Chongyang Ma , Di Zhang , Pengfei Wan , Xiaoguang Han

From Demonstrations to Task-Space Specifications: Using Causal Analysis to Extract Rule Parameterization from Demonstrations

Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work, we show that it is possible to learn generative models for distinct user…

Artificial Intelligence · Computer Science 2020-06-23 Daniel Angelov , Yordan Hristov , Subramanian Ramamoorthy

LuciBot: Automated Robot Policy Learning from Generated Videos

Automatically generating training supervision for embodied tasks is crucial, as manual designing is tedious and not scalable. While prior works use large language models (LLMs) or vision-language models (VLMs) to generate rewards, these…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Xiaowen Qiu , Yian Wang , Jiting Cai , Zhehuan Chen , Chunru Lin , Tsun-Hsuan Wang , Chuang Gan