Computer Science

Unveiling the Visual Counting Bottleneck in Vision-Language Models

While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…

Multimedia · Computer Science 2026-05-29 Xingzhou Pang , Yifan Hou , Junling Wang , Mrinmaya Sachan

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…

Multimedia · Computer Science 2026-05-29 Zhaoyan Pan , Xiangdong Li , Wenke Wu , Mengting Ma , Ye Lou , Ji Zhou , Jiatong Pan , Wei Zhang

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited.…

Multimedia · Computer Science 2026-05-29 Dingkun Zhou , Krish Patel , Ajay Kankipati , Akshaj Gupta , Zeyi Austin Li , Mohul Shukla , Vibhor Narang , Sara Kofman , Zongli Ye , Grace Wang , Xiaoyu Shi , Tingle Li , Guan-Ting Lin , Kan Jen Cheng , Huang-Cheng Chou , Jiachen Lian , Gopala Anumanchipalli

Can We Hear from Events? Generating Speech from Event Camera

Traditional RGB-based speech generation faces Temporal Granularity Mismatch since fixed camera exposure times inevitably blur the high-frequency articulatory transients essential for rendering emotional speech. To break this ceiling, we…

Multimedia · Computer Science 2026-05-27 Jingping Fang , Lin Chen , Chenyang Xu , Tong Zhao , Weidong Cai , Xiaoming Chen

Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

This companion paper provides artifacts and instructions on replicating the experiments in the ACM Multimedia 2024 paper entitled "Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks." Swarm-based hierarchical,…

Multimedia · Computer Science 2026-05-27 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh , Federico Cunico , Joshua Springer

Computing points in connected components defined by a real inequation: algorithms, complexity and implementations, Part I

We consider the problem of computing sample points in each connected component of a semi-algebraic set defined by the non-vanishing or the positivity of an n-variate polynomial of degree d, with rational coefficients of bit size bounded by…

Symbolic Computation · Computer Science 2026-05-27 Jérémy Berthomieu , Edern Gillot , Mohab Safey El Din

Symbolic-Neural Soft-Logic Reasoning: Towards Robust and Verifiable Thinking Chains via Cooperative Evolution

Large Language Models (LLMs) have demonstrated impressive progress in complex reasoning tasks, largely driven by the Chain-of-Thought (CoT) paradigm, which decomposes difficult problems into intermediate steps. However, CoT reasoning…

Symbolic Computation · Computer Science 2026-05-26 Rui Wang , Zeming Wei , Yihao Zhang , Xiaokun Luan

CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models…

Multimedia · Computer Science 2026-05-26 Gyubin Lee , Junwon Lee , Juhan Nam

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow…

Multimedia · Computer Science 2026-05-26 Xiang Fang , Daizong Liu , Pan Zhou , Zichuan Xu , Ruixuan Li

Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

Swarical, a Swarm-based hierarchical localization technique, enables miniature drones, known as Flying Light Specks (FLSs), to accurately and efficiently localize and illuminate complex 2D and 3D shapes. Its accuracy depends on the physical…

Multimedia · Computer Science 2026-05-25 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh

How Far Are We from Generating Missing Modalities with Foundation Models?

Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and…

Multimedia · Computer Science 2026-05-25 Guanzhou Ke , Bo Wang , Guoqing Chao , Weiming Hu , Shengfeng He

A Symbolic Homotopy Algorithm for Solving Composable Polynomial Systems

We study the problem of computing the isolated regular solutions of a system \((f_1,\ldots,f_n)\) of \(n\) polynomial equations in \(n\) variables \((X_1, \dots, X_n)\) over a field of characteristic zero \(k\). We focus on systems with a…

Symbolic Computation · Computer Science 2026-05-22 Thi Xuan Vu

Exploiting the Structure in Tensor Decompositions for Matrix Multiplication

We present a new algorithm for fast matrix multiplication using tensor decompositions which have special features. Thanks to these features we obtain exponents lower than what the rank of the tensor decomposition suggests. In particular for…

Symbolic Computation · Computer Science 2026-05-22 Manuel Kauers , Jakob Moosbauer , Isaac Wood

Symbolic Algorithm for Solving SLAEs with Multi-Diagonal Coefficient Matrices

This paper presents a generalised symbolic algorithm for solving systems of linear algebraic equations with multi-diagonal coefficient matrices. The algorithm is given in a pseudocode. A theorem which gives the condition for correctness of…

Symbolic Computation · Computer Science 2026-05-22 Milena Veneva

Multimodal Emotion Recognition with Large Language Models

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

Multimedia · Computer Science 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

The I-Ching is one of the most influential texts in Chinese intellectual history, integrating divination, cosmology, and ethical reflection. While Western experimental music, most notably John Cage, has drawn on the I-Ching as a source of…

Multimedia · Computer Science 2026-05-21 Ling Qi , Aleksandra Teng Ma , Alexandria Smith

Computing Certificates in Archimedean Univariate Saturated Quadratic Modules

A new symbolic algorithm to compute sums of squares multipliers (certificates) to witness the membership of non-negative univariate polynomials in a saturated univariate quadratic module is presented. Certificates are first computed in…

Symbolic Computation · Computer Science 2026-05-20 Jose Abel Castellanos-Joo , Deepak Kapur

Will It Go Viral? Grounding Micro-Video Popularity Prediction on the Open Web

Micro-video popularity prediction (MVPP) forecasts the popularity a newly uploaded short-form video will attract within a fixed number of days after upload. This task supports downstream applications in recommendation, advertising, and…

Multimedia · Computer Science 2026-05-19 Ryang Heo , Dongha Lee

AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection

Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine…

Symbolic Computation · Computer Science 2026-05-19 Abu Masum , Mehran Moghadam , M. Hassan Najafi , Bige Unluturk , Ulkuhan Guler , Beth A. Beidleman , Sercan Aygun

Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification

Multimedia verification requires not only accurate conclusions but also transparent and contestable reasoning. We propose a contestable multi-agent framework that integrates multimodal large language models, external verification tools, and…

Multimedia · Computer Science 2026-05-15 Truong Thanh Hung Nguyen , Vo Thanh Khang Nguyen , Hoang-Loc Cao , Phuc Ho , Van Pham , Hung Cao