Related papers: Sample Efficient Multimodal Semantic Augmentation …

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video

This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task. The target task involves summarizing a given video into a predefined number of keyframe-caption pairs and displaying…

Computation and Language · Computer Science 2023-12-05 Keito Kudo , Haruki Nagasawa , Jun Suzuki , Nobuyuki Shimizu

Multi-Modal Summary Generation using Multi-Objective Optimization

Significant development of communication technology over the past few years has motivated research in multi-modal summarization techniques. A majority of the previous works on multi-modal summarization focus on text and images. In this…

Information Retrieval · Computer Science 2020-05-20 Anubhav Jangra , Sriparna Saha , Adam Jatowt , Mohammad Hasanuzzaman

Prompting LLMs with content plans to enhance the summarization of scientific articles

This paper presents novel prompting techniques to improve the performance of automatic summarization systems for scientific articles. Scientific article summarization is highly challenging due to the length and complexity of these…

Computation and Language · Computer Science 2023-12-18 Aldan Creo , Manuel Lama , Juan C. Vidal

Semantic Prompt for Few-Shot Image Recognition

Few-shot learning is a challenging problem since only a few examples are provided to recognize a new class. Several recent studies exploit additional semantic information, e.g. text embeddings of class names, to address the issue of rare…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Wentao Chen , Chenyang Si , Zhang Zhang , Liang Wang , Zilei Wang , Tieniu Tan

Progressive Video Summarization via Multimodal Self-supervised Learning

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Li Haopeng , Ke Qiuhong , Gong Mingming , Tom Drummond

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the…

Computation and Language · Computer Science 2023-05-09 Zenan Xu , Xiaojun Meng , Yasheng Wang , Qinliang Su , Zexuan Qiu , Xin Jiang , Qun Liu

Towards an Automated Multimodal Approach for Video Summarization: Building a Bridge Between Text, Audio and Facial Cue-Based Summarization

The increasing volume of video content in educational, professional, and social domains necessitates effective summarization techniques that go beyond traditional unimodal approaches. This paper proposes a behaviour-aware multimodal video…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Md Moinul Islam , Sofoklis Kakouros , Janne Heikkilä , Mourad Oussalah

Enhancing Video Summarization with Context Awareness

Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Hai-Dang Huynh-Lam , Ngoc-Phuong Ho-Thi , Minh-Triet Tran , Trung-Nghia Le

Semantic Prompting with Image-Token for Continual Learning

Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Jisu Han , Jaemin Na , Wonjun Hwang

Comprehensive Video Understanding: Video summarization with content-based video recommender design

Video summarization aims to extract keyframes/shots from a long video. Previous methods mainly take diversity and representativeness of generated summaries as prior knowledge in algorithm design. In this paper, we formulate video…

Computer Vision and Pattern Recognition · Computer Science 2019-10-31 Yudong Jiang , Kaixu Cui , Bo Peng , Changliang Xu

Personalized Video Summarization using Text-Based Queries and Conditional Modeling

The proliferation of video content on platforms like YouTube and Vimeo presents significant challenges in efficiently locating relevant information. Automatic video summarization aims to address this by extracting and presenting key content…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Jia-Hong Huang

GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Jia-Hong Huang , Luka Murn , Marta Mrak , Marcel Worring

Sentence Embeddings as an intermediate target in end-to-end summarisation

Current neural network-based methods to the problem of document summarisation struggle when applied to datasets containing large inputs. In this paper we propose a new approach to the challenge of content-selection when dealing with…

Computation and Language · Computer Science 2025-05-07 Maciej Zembrzuski , Saad Mahamood

Minimal Clips, Maximum Salience: Long Video Summarization via Key Moment Extraction

Vision-Language Models (VLMs) are able to process increasingly longer videos. Yet, important visual information is easily lost throughout the entire context and missed by VLMs. Also, it is important to design tools that enable…

Computation and Language · Computer Science 2026-01-09 Galann Pennec , Zhengyuan Liu , Nicholas Asher , Philippe Muller , Nancy F. Chen

PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization

Few-shot abstractive summarization has become a challenging task in natural language generation. To support it, we designed a novel soft prompts architecture coupled with a prompt pre-training plus fine-tuning paradigm that is effective and…

Computation and Language · Computer Science 2022-10-05 Xiaochen Liu , Yang Gao , Yu Bai , Jiawei Li , Yinan Hu , Heyan Huang , Boxing Chen

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem…

Computer Vision and Pattern Recognition · Computer Science 2017-09-29 Arun Balajee Vasudevan , Michael Gygli , Anna Volokitin , Luc Van Gool

Less is More: Label-Guided Summarization of Procedural and Instructional Videos

Video summarization helps turn long videos into clear, concise representations that are easier to review, document, and analyze, especially in high-stakes domains like surgical training. Prior work has progressed from using basic visual…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Shreya Rajpal , Michal Golovanevsky , Carsten Eickhoff

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

YouTube users looking for instructions for a specific task may spend a long time browsing content trying to find the right video that matches their needs. Creating a visual summary (abridged version of a video) provides viewers with a quick…

Computer Vision and Pattern Recognition · Computer Science 2022-08-16 Medhini Narasimhan , Arsha Nagrani , Chen Sun , Michael Rubinstein , Trevor Darrell , Anna Rohrbach , Cordelia Schmid

Multi-modal Summarization for Video-containing Documents

Summarization of multimedia data becomes increasingly significant as it is the basis for many real-world applications, such as question answering, Web search, and so forth. Most existing multi-modal summarization works however have used…

Computation and Language · Computer Science 2020-09-18 Xiyan Fu , Jun Wang , Zhenglu Yang

Video Summarization Techniques: A Comprehensive Review

The rapid expansion of video content across a variety of industries, including social media, education, entertainment, and surveillance, has made video summarization an essential field of study. The current work is a survey that explores…

Computer Vision and Pattern Recognition · Computer Science 2024-10-08 Toqa Alaa , Ahmad Mongy , Assem Bakr , Mariam Diab , Walid Gomaa