English
Related papers

Related papers: Multimedia Generative Script Learning for Task Pla…

200 papers

Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks,…

Computation and Language · Computer Science 2024-01-22 Jingyuan Qi , Minqian Liu , Ying Shen , Zhiyang Xu , Lifu Huang

The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a…

Computation and Language · Computer Science 2021-09-01 Qing Lyu , Li Zhang , Chris Callison-Burch

Goal-oriented Script Generation is a new task of generating a list of steps that can fulfill the given goal. In this paper, we propose to extend the task from the perspective of cognitive theory. Instead of a simple flat structure, the…

Computation and Language · Computer Science 2023-05-19 Xinze Li , Yixin Cao , Muhao Chen , Aixin Sun

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Zhipeng Bao , Martial Hebert , Yu-Xiong Wang

The goal of this work is to generate step-by-step visual instructions in the form of a sequence of images, given an input image that provides the scene context and the sequence of textual instructions. This is a challenging problem as it…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Tomáš Souček , Prajwal Gatti , Michael Wray , Ivan Laptev , Dima Damen , Josef Sivic

The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users. We call this task…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Alejandro Pardo , Jui-Hsien Wang , Bernard Ghanem , Josef Sivic , Bryan Russell , Fabian Caba Heilbron

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual…

Computer Vision and Pattern Recognition · Computer Science 2021-09-13 Yue Yang , Artemis Panagopoulou , Qing Lyu , Li Zhang , Mark Yatskar , Chris Callison-Burch

Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark…

Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not…

Computation and Language · Computer Science 2023-05-30 Yu Zhou , Sha Li , Manling Li , Xudong Lin , Shih-Fu Chang , Mohit Bansal , Heng Ji

In this paper, we present Tetris, a new task of Goal-Oriented Script Completion. Unlike previous work, it considers a more realistic and general setting, where the input includes not only the goal but also additional user context, including…

Computation and Language · Computer Science 2023-04-25 Chenkai Sun , Tie Xu , ChengXiang Zhai , Heng Ji

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement,…

Robotics · Computer Science 2025-08-27 Nicholas Pfaff , Hongkai Dai , Sergey Zakharov , Shun Iwase , Russ Tedrake

Generating long form narratives such as stories and procedures from multiple modalities has been a long standing dream for artificial intelligence. In this regard, there is often crucial subtext that is derived from the surrounding…

Computation and Language · Computer Science 2020-10-28 Khyathi Raghavi Chandu , Ruo-Ping Dong , Alan Black

Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation. Most existing solutions predominantly depend on paired image-text training data,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Yuechen Wang , Wengang Zhou , Zhenbo Lu , Houqiang Li

The recent advancements in generative language models have demonstrated their ability to memorize knowledge from documents and recall knowledge to respond to user queries effectively. Building upon this capability, we propose to enable…

Multimedia · Computer Science 2024-02-19 Yongqi Li , Wenjie Wang , Leigang Qu , Liqiang Nie , Wenjie Li , Tat-Seng Chua

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering. While generative models provide a consistent network architecture between…

Computer Vision and Pattern Recognition · Computer Science 2022-12-19 Jianfeng Wang , Zhengyuan Yang , Xiaowei Hu , Linjie Li , Kevin Lin , Zhe Gan , Zicheng Liu , Ce Liu , Lijuan Wang

Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Daniel A. P. Oliveira , Eugénio Ribeiro , David Martins de Matos

Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Zijie Chen , Lichao Zhang , Fangsheng Weng , Lili Pan , Zhenzhong Lan

The advent of large pre-trained generative language models has provided a common framework for AI story generation via sampling the model to create sequences that continue the story. However, sampling alone is insufficient for story…

Computation and Language · Computer Science 2021-12-17 Amal Alabdulkarim , Winston Li , Lara J. Martin , Mark O. Riedl

In this work, we study the problem of generating novel images from complex multimodal prompt sequences. While existing methods achieve promising results for text-to-image generation, they often struggle to capture fine-grained details from…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Amandeep Kumar , Muzammal Naseer , Sanath Narayan , Rao Muhammad Anwer , Salman Khan , Hisham Cholakkal

Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video…

Multimedia · Computer Science 2024-12-31 Dingyi Yang , Chunru Zhan , Ziheng Wang , Biao Wang , Tiezheng Ge , Bo Zheng , Qin Jin
‹ Prev 1 2 3 10 Next ›