English
Related papers

Related papers: MEEL: Multi-Modal Event Evolution Learning

200 papers

This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised…

Multimedia Event Extraction (MEE) aims to identify events and their arguments from documents that contain both text and images. It requires grounding event semantics across different modalities. Progress in MEE is limited by the lack of…

Computation and Language · Computer Science 2026-05-28 Yongkang Jin , Jianwen Luo , Jingjing Wang , Jianmin Yao , Yu Hong

Multimodal entity linking (MEL) aims to link ambiguous mentions within multimodal contexts to corresponding entities in a multimodal knowledge base. Most existing approaches to MEL are based on representation learning or vision-and-language…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Zhiwei Hu , Víctor Gutiérrez-Basulto , Ru Li , Jeff Z. Pan

Multimodal event argument role labeling (EARL), a task that assigns a role for each event participant (object) in an image is a complex challenge. It requires reasoning over the entire image, the depicted event, and the interactions between…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Hritik Bansal , Po-Nien Kung , P. Jeffrey Brantingham , Kai-Wei Chang , Nanyun Peng

Recent advances in vision-language models have significantly expanded the frontiers of automated image analysis. However, applying these models in safety-critical contexts remains challenging due to the complex relationships between…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Muhammad Imran , Yugyung Lee

The main task of Multimodal Emotion Recognition in Conversations (MERC) is to identify the emotions in modalities, e.g., text, audio, image and video, which is a significant development direction for realizing machine intelligence. However,…

Sound · Computer Science 2023-12-12 Tao Meng , Yuntao Shou , Wei Ai , Nan Yin , Keqin Li

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Yi Zhang , Mingyuan Chen , Jundong Shen , Chongjun Wang

Multimodal emotion recognition is an important research topic in artificial intelligence, whose main goal is to integrate multimodal clues to identify human emotional states. Current works generally assume accurate labels for benchmark…

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

Multimedia · Computer Science 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs…

Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require…

Computation and Language · Computer Science 2024-03-21 Senbao Shi , Zhenran Xu , Baotian Hu , Min Zhang

Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction. However, the task lacks annotated multimodal training data and artificially generated training data suffer from…

Multimedia · Computer Science 2023-08-14 Zilin Du , Yunxin Li , Xu Guo , Yidan Sun , Boyang Li

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How…

Computation and Language · Computer Science 2024-08-05 Zhengwei Tao , Zhi Jin , Yifan Zhang , Xiancai Chen , Haiyan Zhao , Jia Li , Bing Liang , Chongyang Tao , Qun Liu , Kam-Fai Wong

The event-based Vision-Language Model (VLM) recently has made good progress for practical vision tasks. However, most of these works just utilize CLIP for focusing on traditional perception tasks, which obstruct model understanding…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Pengteng Li , Yunfan Lu , Pinghao Song , Wuyang Li , Huizai Yao , Hui Xiong

Multimodal emotion recognition (MER) aims to identify human emotions by combining data from various modalities such as language, audio, and vision. Despite the recent advances of MER approaches, the limitations in obtaining extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Yehun Song , Sunyoung Cho

Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Simon Klenk , David Bonello , Lukas Koestler , Nikita Araslanov , Daniel Cremers

Fine-tuning Large Language Models (LLMs) with multimodal encoders on modality-specific data expands the modalities that LLMs can handle, leading to the formation of Multimodal LLMs (MLLMs). However, this paradigm heavily relies on…

Computation and Language · Computer Science 2025-05-26 Junlin Li , Guodong DU , Jing Li , Sim Kuan Goh , Wenya Wang , Yequan Wang , Fangming Liu , Ho-Kin Tang , Saleh Alharbi , Daojing He , Min Zhang

The remarkable success of multimodal large language models (MLLMs) has driven advances in multimodal embeddings, yet existing models remain inherently discriminative, limiting their ability to benefit from reasoning-driven generation…

Machine Learning · Computer Science 2026-03-03 Zhibin Lan , Liqiang Niu , Fandong Meng , Jie Zhou , Jinsong Su

Multiobjective evolutionary learning (MOEL) has demonstrated its advantages of training fairer machine learning models considering a predefined set of conflicting objectives, including accuracy and different fairness measures. Recent works…

Machine Learning · Computer Science 2024-09-30 Qingquan Zhang , Jialin Liu , Xin Yao

Multimodal Entity Linking (MEL) is a fundamental task in data management that maps ambiguous mentions with diverse modalities to the multimodal entities in a knowledge base. However, most existing MEL approaches primarily focus on…

Computation and Language · Computer Science 2026-04-23 Mo Zhou , Jianwei Wang , Kai Wang , Helen Paik , Ying Zhang , Wenjie Zhang
‹ Prev 1 2 3 10 Next ›