Computer Science

Unveiling the Visual Counting Bottleneck in Vision-Language Models

While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…

Multimedia · Computer Science 2026-05-29 Xingzhou Pang , Yifan Hou , Junling Wang , Mrinmaya Sachan

RTP-LLM: High-Performance Alibaba LLM Inference Engine

Large Language Models (LLMs) have revolutionized AI applications, but deploying them at scale presents significant challenges. We present RTP-LLM, a high-performance inference engine for industrial-scale LLM deployment, successfully…

Operating Systems · Computer Science 2026-05-29 Boyu Tan , Jiarui Guo , Zongwei Lv , Hanbo Sun , Tong Yang , Kan Liu , Xinfei Shi , Zetao Hu , Yaxin Yu , Chi Zhang , Jianning Zhang , Xi Yang , Wei Zhang , Bo Cai , Silu Zhou , Xiyu Wang , Na He , Yinghao Yu , Wending Bao , Guiyang Huang , Yuxing Yuan , Juncheng Yin , Nan Wang , Lin Yang , Zechao Zhang , Lu Chen , Guoding Li , Tao Lan , Lin Qu

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…

Multimedia · Computer Science 2026-05-29 Zhaoyan Pan , Xiangdong Li , Wenke Wu , Mengting Ma , Ye Lou , Ji Zhou , Jiatong Pan , Wei Zhang

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited.…

Multimedia · Computer Science 2026-05-29 Dingkun Zhou , Krish Patel , Ajay Kankipati , Akshaj Gupta , Zeyi Austin Li , Mohul Shukla , Vibhor Narang , Sara Kofman , Zongli Ye , Grace Wang , Xiaoyu Shi , Tingle Li , Guan-Ting Lin , Kan Jen Cheng , Huang-Cheng Chou , Jiachen Lian , Gopala Anumanchipalli

Bounded Priority-Aware Locking for Real-Time Kernels

A real-time multicore system requires delay bounds on access to shared resources. These resources include the kernel, which has potentially many non-preemptible critical sections guarded by one or more different synchronization primitives.…

Operating Systems · Computer Science 2026-05-28 Shriram Raja , Richard West

Can We Hear from Events? Generating Speech from Event Camera

Traditional RGB-based speech generation faces Temporal Granularity Mismatch since fixed camera exposure times inevitably blur the high-frequency articulatory transients essential for rendering emotional speech. To break this ceiling, we…

Multimedia · Computer Science 2026-05-27 Jingping Fang , Lin Chen , Chenyang Xu , Tong Zhao , Weidong Cai , Xiaoming Chen

Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

This companion paper provides artifacts and instructions on replicating the experiments in the ACM Multimedia 2024 paper entitled "Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks." Swarm-based hierarchical,…

Multimedia · Computer Science 2026-05-27 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh , Federico Cunico , Joshua Springer

LearnedCache: An eBPF-Integrated Perceptron-Based Eviction Policy for the Linux Page Cache

Linux is the foundation of the digital age, accounting for the majority of the cloud and mobile OS markets. Any device that runs Linux uses the Linux page cache, a central pillar in OS and application performance, serving to reduce…

Operating Systems · Computer Science 2026-05-27 Zejia Qi

Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live

KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, which interleave LLM…

Operating Systems · Computer Science 2026-05-27 Hanchen Li , Runyuan He , Qiuyang Mang , Qizheng Zhang , Huanzhi Mao , Xiaokun Chen , Hangrui Zhou , Alvin Cheung , Joseph Gonzalez , Ion Stoica

CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models…

Multimedia · Computer Science 2026-05-26 Gyubin Lee , Junwon Lee , Juhan Nam

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow…

Multimedia · Computer Science 2026-05-26 Xiang Fang , Daizong Liu , Pan Zhou , Zichuan Xu , Ruixuan Li

Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

Swarical, a Swarm-based hierarchical localization technique, enables miniature drones, known as Flying Light Specks (FLSs), to accurately and efficiently localize and illuminate complex 2D and 3D shapes. Its accuracy depends on the physical…

Multimedia · Computer Science 2026-05-25 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh

How Far Are We from Generating Missing Modalities with Foundation Models?

Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and…

Multimedia · Computer Science 2026-05-25 Guanzhou Ke , Bo Wang , Guoqing Chao , Weiming Hu , Shengfeng He

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learning), relying on rapid checkpoint and rollback (C/R) of the complete sandbox state, including files and process state (e.g.,…

Operating Systems · Computer Science 2026-05-22 Yunpeng Dong , Jingkai He , Yuze Hou , Dong Du , Zhonghu Xu , Si Yu , Yubin Xia , Haibo Chen

Multimodal Emotion Recognition with Large Language Models

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

Multimedia · Computer Science 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

ParaCell: Paravirtualized Secure Containers with Lightweight Intra-Container Isolation and Intent-Driven Memory Management

Secure containers isolate each container with its own kernel, mitigating shared-kernel attacks prevalent in traditional container systems. However, existing designs still face a fundamental isolation--performance trade-off. Nested-cloud…

Operating Systems · Computer Science 2026-05-21 Yiyang Wu , Xunjie Wang , Jinyu Gu , Haibo Chen

Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

The I-Ching is one of the most influential texts in Chinese intellectual history, integrating divination, cosmology, and ethical reflection. While Western experimental music, most notably John Cage, has drawn on the I-Ching as a source of…

Multimedia · Computer Science 2026-05-21 Ling Qi , Aleksandra Teng Ma , Alexandria Smith

Clove: Object-Level CXL Memory Management in Managed Runtimes

Object-level management of tiered memory has been studied to address the inefficiencies in page-based systems. However, object-level management for CXL-tiered memory remains underexplored due to CXL's tight performance budget and load/store…

Operating Systems · Computer Science 2026-05-21 Sam Son , Zhihong Luo , Wen Zhang , Sylvia Ratnasamy , Scott Shenker

SSV: Sparse Speculative Verification for Efficient LLM Inference

Speculative decoding and dynamic sparse attention are two complementary approaches for accelerating long-context LLM inference: the former amortizes target-model execution across multiple verifier queries, while the latter reduces each…

Operating Systems · Computer Science 2026-05-21 Zhibin Wang , Ziyu Zhong , Nuo Shen , Yuhang Zhou , Rong Gu , Sheng Zhong

Experimental Analysis of FreeRTOS Dependability through Targeted Fault Injection Campaigns

Real-Time Operating Systems (RTOSes) play a crucial role in safety-critical domains, where deterministic and predictable task execution is essential. Yet they are increasingly exposed to ionizing radiation, which can compromise system…

Operating Systems · Computer Science 2026-05-21 Luca Mannella , Stefano Di Carlo , Alessandro Savino