Computer Science

Unveiling the Visual Counting Bottleneck in Vision-Language Models

While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…

Multimedia · Computer Science 2026-05-29 Xingzhou Pang , Yifan Hou , Junling Wang , Mrinmaya Sachan

Demystifying VEINS: A Reality Check Against Living Lab Experiments

Safety applications in vehicle-to-everything communications and Cooperative Intelligent Transport Systems rely on reliable and timely message exchange, which in turn depends on accurate modeling of wireless signal propagation. Simulation…

Performance · Computer Science 2026-05-29 Antonio Solida , Giovanni Gambigliani Zoccoli , Gaetano Orazio Cauchi , Filip Valgimigli , Salvatore Iandolo , Martin Klapez , Maurizio Casoni , Mirco Marchetti , Carlo Augusto Grazia

From Roofline to Ruggedness: Decomposing and Smoothing the GEMM Performance Landscape

Adjacent GEMM problems that differ by a single 128-element step in N can show 30% different throughput on the same GPU. This pervasive performance ruggedness - invisible to roofline analysis and peak-FLOPs intuition, yet dominant for every…

Performance · Computer Science 2026-05-29 Aditya Chatterjee

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…

Multimedia · Computer Science 2026-05-29 Zhaoyan Pan , Xiangdong Li , Wenke Wu , Mengting Ma , Ye Lou , Ji Zhou , Jiatong Pan , Wei Zhang

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Large language models have achieved remarkable capabilities through scaling, and this paper does not challenge that. It instead investigates a different question: once large models already exist, can they become more accessible to…

Performance · Computer Science 2026-05-29 Myeong Jun Jo

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited.…

Multimedia · Computer Science 2026-05-29 Dingkun Zhou , Krish Patel , Ajay Kankipati , Akshaj Gupta , Zeyi Austin Li , Mohul Shukla , Vibhor Narang , Sara Kofman , Zongli Ye , Grace Wang , Xiaoyu Shi , Tingle Li , Guan-Ting Lin , Kan Jen Cheng , Huang-Cheng Chou , Jiachen Lian , Gopala Anumanchipalli

Range, Not Precision: Block-Floating-Point Half-Precision FFT and SAR Imaging on Apple Silicon

Half precision (FP16) promises to double FFT throughput on GPUs, but the prevailing view is that its 10-bit mantissa makes it unsuitable for radar-grade signal processing. We show this framing is wrong on Apple Silicon: the binding…

Performance · Computer Science 2026-05-28 Mohamed Amine Bergach

Can We Hear from Events? Generating Speech from Event Camera

Traditional RGB-based speech generation faces Temporal Granularity Mismatch since fixed camera exposure times inevitably blur the high-frequency articulatory transients essential for rendering emotional speech. To break this ceiling, we…

Multimedia · Computer Science 2026-05-27 Jingping Fang , Lin Chen , Chenyang Xu , Tong Zhao , Weidong Cai , Xiaoming Chen

Attributing the System's Overall Effect to its Components

In a computer system, multiple indispensable components-such as the CPU, memory, and others-work together with other essential components to produce an overall effect, which can only be measured on an independently running system. Since the…

Performance · Computer Science 2026-05-27 Chenxi Wang , Lei Wang , Wanling Gao , Fanda Fan , Guoxin Kang , Hongxiao Li , Yuchen Su , Jianfeng Zhan

Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

This companion paper provides artifacts and instructions on replicating the experiments in the ACM Multimedia 2024 paper entitled "Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks." Swarm-based hierarchical,…

Multimedia · Computer Science 2026-05-27 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh , Federico Cunico , Joshua Springer

CARINA: Carbon-Aware Execution of Recurrent Industrial Analytics

Recurring industrial analytics and machine-learning workflows are becoming a major computational burden in modern engineering practice. Large parametric database generation, scheduled model retraining, repeated evaluation pipelines, and…

Performance · Computer Science 2026-05-26 Muhammad Umar Farooq

CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models…

Multimedia · Computer Science 2026-05-26 Gyubin Lee , Junwon Lee , Juhan Nam

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow…

Multimedia · Computer Science 2026-05-26 Xiang Fang , Daizong Liu , Pan Zhou , Zichuan Xu , Ruixuan Li

Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

Swarical, a Swarm-based hierarchical localization technique, enables miniature drones, known as Flying Light Specks (FLSs), to accurately and efficiently localize and illuminate complex 2D and 3D shapes. Its accuracy depends on the physical…

Multimedia · Computer Science 2026-05-25 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh

How Far Are We from Generating Missing Modalities with Foundation Models?

Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and…

Multimedia · Computer Science 2026-05-25 Guanzhou Ke , Bo Wang , Guoqing Chao , Weiming Hu , Shengfeng He

Throughput-Optimal Multiresource-Job Scheduling with Continuous Requirement Distribution

Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is…

Performance · Computer Science 2026-05-22 Heyuan Yao , Willow Kowalik , Izzy Grosof

Multimodal Emotion Recognition with Large Language Models

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

Multimedia · Computer Science 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

The I-Ching is one of the most influential texts in Chinese intellectual history, integrating divination, cosmology, and ethical reflection. While Western experimental music, most notably John Cage, has drawn on the I-Ching as a source of…

Multimedia · Computer Science 2026-05-21 Ling Qi , Aleksandra Teng Ma , Alexandria Smith

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched…

Performance · Computer Science 2026-05-21 Vladimir Iglovikov , Dmitry Kosarevsky

Modeling the Impact of Fiber Latency on Compute-Communication Overlap in Geo-Distributed Multi-Datacenter AI Training

We use discrete-event simulation to quantify the impact of fiber latency on the efficacy of geo-distributed AI model training with data parallelism. We conclude that the optimum distances between two AI clusters is 10-100km, over which…

Performance · Computer Science 2026-05-20 Ioannis Papavasileiou , Sairam Prabhakar , Indu Kant Deo , Sergejs Makovejs