Related papers: MEEL: Multi-Modal Event Evolution Learning

MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences

This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised…

Machine Learning · Computer Science 2024-07-04 Viktor Moskvoretskii , Dmitry Osin , Egor Shvetsov , Igor Udovichenko , Maxim Zhelnin , Andrey Dukhovny , Anna Zhimerikina , Evgeny Burnaev

RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction

Multimedia Event Extraction (MEE) aims to identify events and their arguments from documents that contain both text and images. It requires grounding event semantics across different modalities. Progress in MEE is limited by the lack of…

Computation and Language · Computer Science 2026-05-28 Yongkang Jin , Jianwen Luo , Jingjing Wang , Jianmin Yao , Yu Hong

Multi-level Matching Network for Multimodal Entity Linking

Multimodal entity linking (MEL) aims to link ambiguous mentions within multimodal contexts to corresponding entities in a multimodal knowledge base. Most existing approaches to MEL are based on representation learning or vision-and-language…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Zhiwei Hu , Víctor Gutiérrez-Basulto , Ru Li , Jeff Z. Pan

GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

Multimodal event argument role labeling (EARL), a task that assigns a role for each event participant (object) in an image is a complex challenge. It requires reasoning over the entire image, the depicted event, and the interactions between…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Hritik Bansal , Po-Nien Kung , P. Jeffrey Brantingham , Kai-Wei Chang , Nanyun Peng

Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models

Recent advances in vision-language models have significantly expanded the frontiers of automated image analysis. However, applying these models in safety-critical contexts remains challenging due to the complex relationships between…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Muhammad Imran , Yugyung Lee

Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations

The main task of Multimodal Emotion Recognition in Conversations (MERC) is to identify the emotions in modalities, e.g., text, audio, image and video, which is a significant development direction for realizing machine intelligence. However,…

Sound · Computer Science 2023-12-12 Tao Meng , Yuntao Shou , Wei Ai , Nan Yin , Keqin Li

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Yi Zhang , Mingyuan Chen , Jundong Shen , Chongjun Wang

Explainable Multimodal Emotion Recognition

Multimodal emotion recognition is an important research topic in artificial intelligence, whose main goal is to integrate multimodal clues to identify human emotional states. Current works generally assume accurate labels for benchmark…

Multimedia · Computer Science 2024-05-24 Zheng Lian , Haiyang Sun , Licai Sun , Hao Gu , Zhuofan Wen , Siyuan Zhang , Shun Chen , Mingyu Xu , Ke Xu , Kang Chen , Lan Chen , Shan Liang , Ya Li , Jiangyan Yi , Bin Liu , Jianhua Tao

Multimodal Emotion Recognition with Large Language Models

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

Multimedia · Computer Science 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs…

Computation and Language · Computer Science 2025-01-03 Run Luo , Haonan Zhang , Longze Chen , Ting-En Lin , Xiong Liu , Yuchuan Wu , Min Yang , Minzheng Wang , Pengpeng Zeng , Lianli Gao , Heng Tao Shen , Yunshui Li , Xiaobo Xia , Fei Huang , Jingkuan Song , Yongbin Li

Generative Multimodal Entity Linking

Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require…

Computation and Language · Computer Science 2024-03-21 Senbao Shi , Zhenran Xu , Baotian Hu , Min Zhang

Training Multimedia Event Extraction With Generated Images and Captions

Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction. However, the task lacks annotated multimodal training data and artificially generated training data suffer from…

Multimedia · Computer Science 2023-08-14 Zilin Du , Yunxin Li , Xu Guo , Yidan Sun , Boyang Li

A Comprehensive Evaluation on Event Reasoning of Large Language Models

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How…

Computation and Language · Computer Science 2024-08-05 Zhengwei Tao , Zhi Jin , Yifan Zhang , Xiancai Chen , Haiyan Zhao , Jia Li , Bing Liang , Chongyang Tao , Qun Liu , Kam-Fai Wong

EventVL: Understand Event Streams via Multimodal Large Language Model

The event-based Vision-Language Model (VLM) recently has made good progress for practical vision tasks. However, most of these works just utilize CLIP for focusing on traditional perception tasks, which obstruct model understanding…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Pengteng Li , Yunfan Lu , Pinghao Song , Wuyang Li , Huizai Yao , Hui Xiong

Leveraging CLIP Encoder for Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to identify human emotions by combining data from various modalities such as language, audio, and vision. Despite the recent advances of MER approaches, the limitations in obtaining extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Yehun Song , Sunyoung Cho

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Simon Klenk , David Bonello , Lukas Koestler , Nikita Araslanov , Daniel Cremers

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling

Fine-tuning Large Language Models (LLMs) with multimodal encoders on modality-specific data expands the modalities that LLMs can handle, leading to the formation of Multimodal LLMs (MLLMs). However, this paradigm heavily relies on…

Computation and Language · Computer Science 2025-05-26 Junlin Li , Guodong DU , Jing Li , Sim Kuan Goh , Wenya Wang , Yequan Wang , Fangming Liu , Ho-Kin Tang , Saleh Alharbi , Daojing He , Min Zhang

UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

The remarkable success of multimodal large language models (MLLMs) has driven advances in multimodal embeddings, yet existing models remain inherently discriminative, limiting their ability to benefit from reasoning-driven generation…

Machine Learning · Computer Science 2026-03-03 Zhibin Lan , Liqiang Niu , Fandong Meng , Jie Zhou , Jinsong Su

Fairness-aware Multiobjective Evolutionary Learning

Multiobjective evolutionary learning (MOEL) has demonstrated its advantages of training fairer machine learning models considering a predefined set of conflicting objectives, including accuracy and different fairness measures. Recent works…

Machine Learning · Computer Science 2024-09-30 Qingquan Zhang , Jialin Liu , Xin Yao

Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking

Multimodal Entity Linking (MEL) is a fundamental task in data management that maps ambiguous mentions with diverse modalities to the multimodal entities in a knowledge base. However, most existing MEL approaches primarily focus on…

Computation and Language · Computer Science 2026-04-23 Mo Zhou , Jianwei Wang , Kai Wang , Helen Paik , Ying Zhang , Wenjie Zhang