Related papers: Multi-Modal Summary Generation using Multi-Objecti…

Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization

Multimodal abstractive summarization (MAS) aims to produce a concise summary given the multimodal data (text and vision). Existing studies mainly focus on how to effectively use the visual features from the perspective of an article, having…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 Yunlong Liang , Fandong Meng , Jinan Xu , Jiaan Wang , Yufeng Chen , Jie Zhou

Multi-modal Summarization for Video-containing Documents

Summarization of multimedia data becomes increasingly significant as it is the basis for many real-world applications, such as question answering, Web search, and so forth. Most existing multi-modal summarization works however have used…

Computation and Language · Computer Science 2020-09-18 Xiyan Fu , Jun Wang , Zhenglu Yang

GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Jia-Hong Huang , Luka Murn , Marta Mrak , Marcel Worring

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the…

Computation and Language · Computer Science 2023-05-09 Zenan Xu , Xiaojun Meng , Yasheng Wang , Qinliang Su , Zexuan Qiu , Xin Jiang , Qun Liu

VideoXum: Cross-modal Visual and Textural Summarization of Videos

Video summarization aims to distill the most important information from a source video to produce either an abridged clip or a textual narrative. Traditionally, different methods have been proposed depending on whether the output is a video…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Jingyang Lin , Hang Hua , Ming Chen , Yikang Li , Jenhao Hsiao , Chiuman Ho , Jiebo Luo

MHMS: Multimodal Hierarchical Multimedia Summarization

Multimedia summarization with multimodal output can play an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2022-04-11 Jielin Qiu , Jiacheng Zhu , Mengdi Xu , Franck Dernoncourt , Trung Bui , Zhaowen Wang , Bo Li , Ding Zhao , Hailin Jin

Conditional Modeling Based Automatic Video Summarization

The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods mainly rely on visual factors, such as visual consecutiveness and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Jia-Hong Huang , Chao-Han Huck Yang , Pin-Yu Chen , Min-Hung Chen , Marcel Worring

Multimodal Abstractive Summarization for How2 Videos

In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to "compress" text information but rather to provide a fluent textual summary of information that has…

Computation and Language · Computer Science 2019-06-20 Shruti Palaskar , Jindrich Libovický , Spandana Gella , Florian Metze

A Survey on Multi-modal Summarization

The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of…

Computation and Language · Computer Science 2023-02-14 Anubhav Jangra , Sourajit Mukherjee , Adam Jatowt , Sriparna Saha , Mohammad Hasanuzzaman

UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation

With the rapid increase of multimedia data, a large body of literature has emerged to work on multimodal summarization, the majority of which target at refining salient information from textual and visual modalities to output a pictorial…

Computation and Language · Computer Science 2022-02-16 Zhengkun Zhang , Xiaojun Meng , Yasheng Wang , Xin Jiang , Qun Liu , Zhenglu Yang

Causal Video Summarizer for Video Exploration

Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Jia-Hong Huang , Chao-Han Huck Yang , Pin-Yu Chen , Andrew Brown , Marcel Worring

Topic-Guided Abstractive Multi-Document Summarization

A critical point of multi-document summarization (MDS) is to learn the relations among various documents. In this paper, we propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph, taking…

Computation and Language · Computer Science 2021-10-22 Peng Cui , Le Hu

Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure

Vision-Language Models (VLMs) can process visual and textual information in multiple formats: texts, images, interleaved texts and images, or even hour-long videos. In this work, we conduct fine-grained quantitative and qualitative analyses…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Théo Gigant , Camille Guinaudeau , Frédéric Dufaux

A Unified Multi-Faceted Video Summarization System

This paper addresses automatic summarization and search in visual data comprising of videos, live streams and image collections in a unified manner. In particular, we propose a framework for multi-faceted summarization which extracts…

Computer Vision and Pattern Recognition · Computer Science 2017-04-06 Anurag Sahoo , Vishal Kaushal , Khoshrav Doctor , Suyash Shetty , Rishabh Iyer , Ganesh Ramakrishnan

Progressive Video Summarization via Multimodal Self-supervised Learning

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Li Haopeng , Ke Qiuhong , Gong Mingming , Tom Drummond

Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization

In this work, we develop a prompting approach for incremental summarization of task videos. We develop a sample-efficient few-shot approach for extracting semantic concepts as an intermediate step. We leverage an existing model for…

Computation and Language · Computer Science 2023-03-09 Sumanta Bhattacharyya , Ramesh Manuvinakurike , Sahisnu Mazumder , Saurav Sahay

Self-Supervised Multimodal Opinion Summarization

Recently, opinion summarization, which is the generation of a summary from multiple reviews, has been conducted in a self-supervised manner by considering a sampled review as a pseudo summary. However, non-text data such as image and…

Computation and Language · Computer Science 2021-05-28 Jinbae Im , Moonki Kim , Hoyeop Lee , Hyunsouk Cho , Sehee Chung

Cross-Modal State-Space Graph Reasoning for Structured Summarization

The ability to extract compact, meaningful summaries from large-scale and multimodal data is critical for numerous applications, ranging from video analytics to medical reports. Prior methods in cross-modal summarization have often suffered…

Computation and Language · Computer Science 2025-07-31 Hannah Kim , Sofia Martinez , Jason Lee

Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated…

Artificial Intelligence · Computer Science 2024-06-21 Pranav Janjani , Mayank Palan , Sarvesh Shirude , Ninad Shegokar , Sunny Kumar , Faruk Kazi

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient maintenance, data inaccessibility,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Jielin Qiu , Jiacheng Zhu , William Han , Aditesh Kumar , Karthik Mittal , Claire Jin , Zhengyuan Yang , Linjie Li , Jianfeng Wang , Ding Zhao , Bo Li , Lijuan Wang