Related papers: Towards Data-Driven Automatic Video Editing

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. While the text-driven video editing has demonstrated remarkable ability to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Zhichao Zuo , Zhao Zhang , Yan Luo , Yang Zhao , Haijun Zhang , Yi Yang , Meng Wang

Automatic Non-Linear Video Editing Transfer

We propose an automatic approach that extracts editing styles in a source video and applies the edits to matched footage for video creation. Our Computer Vision based techniques considers framing, content type, playback speed, and lighting…

Computer Vision and Pattern Recognition · Computer Science 2021-05-17 Nathan Frey , Peggy Chi , Weilong Yang , Irfan Essa

Learning to Cut by Watching Movies

Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due…

Computer Vision and Pattern Recognition · Computer Science 2021-09-30 Alejandro Pardo , Fabian Caba Heilbron , Juan León Alcázar , Ali Thabet , Bernard Ghanem

AI video editing tools. What editors want and how far is AI from delivering?

Video editing can be a very tedious task, so unsurprisingly Artificial Intelligence has been increasingly used to streamline the workflow or automate away tedious tasks. However, it is very difficult to get an overview of what intelligent…

Human-Computer Interaction · Computer Science 2021-09-17 Than Htut Soe

The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

Machine learning is transforming the video editing industry. Recent advances in computer vision have leveled-up video editing tasks such as intelligent reframing, rotoscoping, color grading, or applying digital makeups. However, most of the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-22 Dawit Mureja Argaw , Fabian Caba Heilbron , Joon-Young Lee , Markus Woodson , In So Kweon

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically…

Computer Vision and Pattern Recognition · Computer Science 2016-04-12 Lorenzo Baraldi , Costantino Grana , Rita Cucchiara

Agent-based Video Trimming

As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Lingfeng Yang , Zhenyuan Chen , Xiang Li , Peiyang Jia , Liangqu Long , Jian Yang

EditDuet: A Multi-Agent System for Video Non-Linear Editing

Automated tools for video editing and assembly have applications ranging from filmmaking and advertisement to content creation for social media. Previous video editing work has mainly focused on either retrieval or user interfaces, leaving…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Marcelo Sandoval-Castaneda , Bryan Russell , Josef Sivic , Gregory Shakhnarovich , Fabian Caba Heilbron

Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media

Creators struggle to edit long-form, narrative-rich videos not because of UI complexity, but due to the cognitive demands of searching, storyboarding, and sequencing hours of footage. Existing transcript- or embedding-based methods fall…

Artificial Intelligence · Computer Science 2025-09-30 Zihan Ding , Xinyi Wang , Junlong Chen , Per Ola Kristensson , Junxiao Shen

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Shihan Cheng , Nilesh Kulkarni , David Hyde , Dmitriy Smirnov

A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

In this era of videos, automatic video editing techniques attract more and more attention from industry and academia since they can reduce workloads and lower the requirements for human editors. Existing automatic editing systems are mainly…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Panwen Hu , Nan Xiao , Feifei Li , Yongquan Chen , Rui Huang

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything. However, the creation process remains laborious, complex, and largely inaccessible to everyday users. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Hao-Yu Hsu , Zhi-Hao Lin , Albert Zhai , Hongchi Xia , Shenlong Wang

Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable

Denoising is a crucial step in many video processing pipelines such as in interactive editing, where high quality, speed, and user control are essential. While recent approaches achieve significant improvements in denoising quality by…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Xin Jin , Simon Niklaus , Zhoutong Zhang , Zhihao Xia , Chunle Guo , Yuting Yang , Jiawen Chen , Chongyi Li

Clarification of Video Retrieval Query Results by the Automated Insertion of Supporting Shots

Computational Video Editing Systems output video generally follows a particular form, e.g. conversation or music videos, in this way they are domain specific. We describe a recent development in our video annotation and segmentation system…

Multimedia · Computer Science 2021-02-23 Sean Butler

Text2LIVE: Text-Driven Layered Image and Video Editing

We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or…

Computer Vision and Pattern Recognition · Computer Science 2022-05-26 Omer Bar-Tal , Dolev Ofri-Amar , Rafail Fridman , Yoni Kasten , Tali Dekel

Consistent Video-to-Video Transfer Using Synthetic Dataset

We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for…

Computer Vision and Pattern Recognition · Computer Science 2023-12-04 Jiaxin Cheng , Tianjun Xiao , Tong He

Rewriting Video: Text-Driven Reauthoring of Video Footage

Video is a powerful medium for communication and storytelling, yet reauthoring existing footage remains challenging. Even simple edits often demand expertise, time, and careful planning, constraining how creators envision and shape their…

Human-Computer Interaction · Computer Science 2026-04-07 Sitong Wang , Anh Truong , Lydia B. Chilton , Dingzeyu Li

Intuitive Facial Animation Editing Based On A Generative RNN Framework

For the last decades, the concern of producing convincing facial animation has garnered great interest, that has only been accelerating with the recent explosion of 3D content in both entertainment and professional activities. The use of…

Graphics · Computer Science 2020-10-13 Eloïse Berson , Catherine Soladié , Nicolas Stoiber

Generative Photographic Control for Scene-Consistent Video Cinematic Editing

Cinematic storytelling is profoundly shaped by the artful manipulation of photographic elements such as depth of field and exposure. These effects are crucial in conveying mood and creating aesthetic appeal. However, controlling these…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Huiqiang Sun , Liao Shen , Zhan Peng , Kun Wang , Size Wu , Yuhang Zang , Tianqi Liu , Zihao Huang , Xingyu Zeng , Zhiguo Cao , Wei Li , Chen Change Loy

Shape-aware Text-driven Layered Video Editing

Temporal consistency is essential for video editing applications. Existing work on layered representation of videos allows propagating edits consistently to each frame. These methods, however, can only edit object appearance rather than…

Computer Vision and Pattern Recognition · Computer Science 2023-01-31 Yao-Chih Lee , Ji-Ze Genevieve Jang , Yi-Ting Chen , Elizabeth Qiu , Jia-Bin Huang