多媒体 — Scifaro

CellPrior-Net: Prior-Guided Nuclei Detection and Classification for H&E Whole-Slide Images

Accurate nuclei detection and classification in hematoxylin and eosin (H and E) whole-slide images (WSIs) is a key task in computational pathology, particularly for quantitative analysis of the tumor microenvironment. However, this task…

多媒体 · 计算机科学 2026-07-01 Falah Jabar , Pasquale Lombardi , Aria Torkpour , Masoud Tafavvoghi , Per Niklas Benzler Waaler , Sigve Andersen , Erna-Elise Paulsen , Mette Pøhl , Lill-Tove Rasmussen Busund , Tom Donnem , Elin Richardsen , David J. Pinato , Mehrdad Rakaee

A First Exploration of Neuromorphic OT-CFM for Multi-Speaker VSR

Visual Speech Recognition (VSR) tasks in complex multi-speaker scenarios are severely hindered by rapid head motions, occlusions, and subtle lip articulations. Traditional RGB-based methods struggle here due to low rates and motion blur of…

多媒体 · 计算机科学 2026-07-01 Lin Chen , Jingping Fang , Hairui Liu , Chenyang Xu , Junhao Chen , Xiaorui Li , Weidong Cai , Xiaoming Chen

Evidence Triangulation for Multimodal Fact-Checking in the Wild

The proliferation of multimedia content on social platforms has fueled multimodal misinformation, where images are used to reinforce false claims. Consequently, Multimodal Fact-Checking (MFC) has emerged as an increasingly important…

多媒体 · 计算机科学 2026-06-30 Stefanos-Iordanis Papadopoulos , Zacharias Chrysidis , Christos Koutlis , Symeon Papadopoulos , Panagiotis C. Petrantonakis

Vertigo Vertigo: Reconstructing a Cinematic Ideal through its Predictive AI Double

Vertigo Vertigo is a scene-for-scene AI reconstruction of Hitchcock's Vertigo (1958), generated from only 2.78% of the original film's frames. Using this sparse set of keyframe anchors, we perform first-last frame interpolation via a large…

多媒体 · 计算机科学 2026-06-29 Adam Cole , Mick Grierson

From Design Principles to Prototype: A Game for Students with ADHD and Learning Disabilities Transitioning to Post-Secondary Education

Students with Attention Deficit Hyperactivity Disorder (ADHD) and Learning Disabilities (LD) can face significant academic, social, and organizational challenges when transitioning to post-secondary education. This paper presents a…

多媒体 · 计算机科学 2026-06-28 Avery Keuben , Talaal Irtija , Joseph Tandyo , Stefanie Ng , Amy Wiebe , Samuel Gaudet , Rebekah Leslie , Meadow Schroeder , Lauren Goegan , Richard Zhao

A Good Talk Does not Look Like a Summary, It Teaches You! Measuring Takeaways from Paper-to-Video Talks

Automatically generated videos from scientific papers are increasingly used for education and research dissemination. However, existing evaluation metrics mainly measure visual quality or whether key points from the paper appear in the…

多媒体 · 计算机科学 2026-06-26 Ishani Mondal , Aparna Garimella , Ananya Sai , Pannaga Shivaswamy , Jordan Boyd-Graber

It Lied to a Doctor to Buy Poison Ingredients: Quantifying Real-World Misuse of Phone-use Agents

Phone-use Agents can execute complex tasks end to end across real mobile applications. By operating a real device on the user's behalf, they reach far more functionalities than CLI agents, which amplifies the real-world harm they can cause…

多媒体 · 计算机科学 2026-06-26 Yiming Sun , Chen Chen , Zifan Zhou , Mi Zhang

An Evaluation of Decentralized Group Formation Techniques for Flying Light Specks

Group formation is fundamental for 3D displays that use Flying Light Specks, FLSs, to illuminate shapes and provide haptic interactions. An FLS is a drone with light sources that illuminates a shape. Groups of G FLSs may implement…

多媒体 · 计算机科学 2026-06-25 Hamed Alimohammadzadeh , Heather Culbertson , Shahram Ghandeharizadeh

Unveiling the Visual Counting Bottleneck in Vision-Language Models

While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…

多媒体 · 计算机科学 2026-05-29 Xingzhou Pang , Yifan Hou , Junling Wang , Mrinmaya Sachan

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…

多媒体 · 计算机科学 2026-05-29 Zhaoyan Pan , Xiangdong Li , Wenke Wu , Mengting Ma , Ye Lou , Ji Zhou , Jiatong Pan , Wei Zhang

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited.…

多媒体 · 计算机科学 2026-05-29 Dingkun Zhou , Krish Patel , Ajay Kankipati , Akshaj Gupta , Zeyi Austin Li , Mohul Shukla , Vibhor Narang , Sara Kofman , Zongli Ye , Grace Wang , Xiaoyu Shi , Tingle Li , Guan-Ting Lin , Kan Jen Cheng , Huang-Cheng Chou , Jiachen Lian , Gopala Anumanchipalli

Can We Hear from Events? Generating Speech from Event Camera

Traditional RGB-based speech generation faces Temporal Granularity Mismatch since fixed camera exposure times inevitably blur the high-frequency articulatory transients essential for rendering emotional speech. To break this ceiling, we…

多媒体 · 计算机科学 2026-05-27 Jingping Fang , Lin Chen , Chenyang Xu , Tong Zhao , Weidong Cai , Xiaoming Chen

Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

This companion paper provides artifacts and instructions on replicating the experiments in the ACM Multimedia 2024 paper entitled "Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks." Swarm-based hierarchical,…

多媒体 · 计算机科学 2026-05-27 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh , Federico Cunico , Joshua Springer

CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation

We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models…

多媒体 · 计算机科学 2026-05-26 Gyubin Lee , Junwon Lee , Juhan Nam

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow…

多媒体 · 计算机科学 2026-05-26 Xiang Fang , Daizong Liu , Pan Zhou , Zichuan Xu , Ruixuan Li

Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks

Swarical, a Swarm-based hierarchical localization technique, enables miniature drones, known as Flying Light Specks (FLSs), to accurately and efficiently localize and illuminate complex 2D and 3D shapes. Its accuracy depends on the physical…

多媒体 · 计算机科学 2026-05-25 Hamed Alimohammadzadeh , Shahram Ghandeharizadeh

How Far Are We from Generating Missing Modalities with Foundation Models?

Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and…

多媒体 · 计算机科学 2026-05-25 Guanzhou Ke , Bo Wang , Guoqing Chao , Weiming Hu , Shengfeng He

Multimodal Emotion Recognition with Large Language Models

Multimodal Emotion Recognition (MER) focuses on identifying and interpreting emotions from modality-compound inputs. Closely mirroring human cognitive processes in real-world environments, MER has drawn substantial attention from both…

多媒体 · 计算机科学 2026-05-21 Hongrui Zhang , Daiqing Wu , Yangyang Li , Kuien Liu , Yuhui Wang , Yu Zhou , Sicheng Zhao

Music of Changing Lines: Toward a Culturally Situated Approach to the I-Ching

The I-Ching is one of the most influential texts in Chinese intellectual history, integrating divination, cosmology, and ethical reflection. While Western experimental music, most notably John Cage, has drawn on the I-Ching as a source of…

多媒体 · 计算机科学 2026-05-21 Ling Qi , Aleksandra Teng Ma , Alexandria Smith

Will It Go Viral? Grounding Micro-Video Popularity Prediction on the Open Web

Micro-video popularity prediction (MVPP) forecasts the popularity a newly uploaded short-form video will attract within a fixed number of days after upload. This task supports downstream applications in recommendation, advertising, and…

多媒体 · 计算机科学 2026-05-19 Ryang Heo , Dongha Lee