多媒体 — Scifaro

MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation

The evolution of video generation toward complex, multi-shot narratives has exposed a critical deficit in current evaluation methods. Existing benchmarks remain anchored to single-shot paradigms, lacking the comprehensive story assets and…

多媒体 · 计算机科学 2026-03-02 Haoyuan Shi , Yunxin Li , Nanhao Deng , Zhenran Xu , Xinyu Chen , Longyue Wang , Baotian Hu , Min Zhang

M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction

Accurately predicting the popularity of micro-videos is a critical but challenging task, characterized by volatile, `rollercoaster-like' engagement dynamics. Existing methods often fail to capture these complex temporal patterns, leading to…

多媒体 · 计算机科学 2026-03-02 Jiacheng Lu , Weijian Wang , Mingyuan Xiao , Yang Hua , Tao Song , Jiaru Zhang , Bo Peng , Cheng Hua , Haibing Guan

MViR: Multi-View Visual-Semantic Representation for Fake News Detection

With the rise of online social networks, detecting fake news accurately is essential for a healthy online environment. While existing methods have advanced multimodal fake news detection, they often neglect the multi-view visual-semantic…

多媒体 · 计算机科学 2026-02-27 Haochen Liang , Xinqi Su , Jun Wang , Chaomeng Chen , Zitong Yu

Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads

Video-based ads are a vital medium for brands to engage consumers, with social media platforms leveraging user data to optimize ad delivery and boost engagement. A crucial but under-explored aspect is the 'hooking period', the first three…

多媒体 · 计算机科学 2026-02-27 Kunpeng Zhang , Poppy Zhang , Shawndra Hill , Amel Awadelkarim

Structured Image-based Coding for Efficient Gaussian Splatting Compression

Gaussian Splatting (GS) has recently emerged as a state-of-the-art representation for radiance fields, combining real-time rendering with high visual fidelity. However, GS models require storing millions of parameters, leading to large file…

多媒体 · 计算机科学 2026-02-27 Pedro Martin , Antonio Rodrigues , Joao Ascenso , Maria Paula Queluz

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS). Concretely, we first reveal that multi-head attentions (MHAs), the…

多媒体 · 计算机科学 2026-02-27 Qiong Wu , Weihao Ye , Yiyi Zhou , Xiaoshuai Sun , Rongrong Ji

A Generic Web Component for WebRTC Pub-Sub

We present video-io, a generic web component to publish or subscribe to a media stream in WebRTC (web real-time communication) applications. Unlike a call or conference room abstraction of existing video conferencing services, it uses a…

多媒体 · 计算机科学 2026-02-26 Kundan Singh

A 3D-Cascading Crossing Coupling Framework for Hyperchaotic Map Construction and Its Application to Color Image Encryption

This paper focuses on hyperchaotic-map construction and proposes a 3D-Cascading Crossing Coupling framework (3D-CCC), which cascades, crosses, and couples three one-dimensional chaotic maps to form a three-dimensional hyperchaotic system.…

多媒体 · 计算机科学 2026-02-26 Jilei Sun , Dianhong Wu

SPP-SCL: Semi-Push-Pull Supervised Contrastive Learning for Image-Text Sentiment Analysis and Beyond

Existing Image-Text Sentiment Analysis (ITSA) methods may suffer from inconsistent intra-modal and inter-modal sentiment relationships. Therefore, we develop a method that balances before fusing to solve the issue of vision-language…

多媒体 · 计算机科学 2026-02-25 Jiesheng Wu , Shengrong Li

Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking…

多媒体 · 计算机科学 2026-02-24 Chunlei Meng , Jiabin Luo , Zhenglin Yan , Zhenyu Yu , Rong Fu , Zhongxue Gan , Chun Ouyang

Health+: Empowering Individuals via Unifying Health Data

Managing personal health data is a challenge in today's fragmented and institution-centric healthcare ecosystem. Individuals often lack meaningful control over their medical records, which are scattered across incompatible systems and…

多媒体 · 计算机科学 2026-02-24 Sujaya Maiyya , Shantanu Sharma , Avinash Kumar

Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution

For real-world BCI applications, lightweight Electroencephalography (EEG) systems offer the best cost-deployment balance. However, such spatial sparsity of EEG limits spatial fidelity, hurting learning and introducing bias. EEG spatial…

多媒体 · 计算机科学 2026-02-24 Hongjun Liu , Leyu Zhou , Zijianghao Yang , Chao Yao

A Survey on Cross-Modal Interaction Between Music and Multimodal Data

Multimodal learning has driven innovation across various industries, particularly in the field of music. By enabling more intuitive interaction experiences and enhancing immersion, it not only lowers the entry barriers to the music but also…

多媒体 · 计算机科学 2026-02-24 Sifei Li , Mining Tan , Feier Shen , Minyan Luo , Zijiao Yin , Fan Tang , Weiming Dong , Changsheng Xu

MusicSem: A Semantically Rich Language--Audio Dataset of Natural Music Descriptions

Music representation learning is central to music information retrieval and generation. While recent advances in multimodal learning have improved alignment between text and audio for tasks such as cross-modal music retrieval, text-to-music…

多媒体 · 计算机科学 2026-02-23 Rebecca Salganik , Teng Tu , Fei-Yueh Chen , Xiaohao Liu , Keifeng Lu , Ethan Luvisia , Zhiyao Duan , Guillaume Salha-Galvan , Anson Kahng , Yunshan Ma , Jian Kang

CAFE: Channel-Autoregressive Factorized Encoding for Robust Biosignal Spatial Super-Resolution

High-density biosignal recordings are critical for neural decoding and clinical monitoring, yet real-world deployments often rely on low-density (LD) montages due to hardware and operational constraints. This motivates spatial…

多媒体 · 计算机科学 2026-02-20 Hongjun Liu , Leyu Zhou , Zijianghao Yang , Rujun Han , Shitong Duan , Kuanjian Tang , Chao Yao

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

Real-time conversational assistants for procedural tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides…

多媒体 · 计算机科学 2026-02-18 Rehana Mahfuz , Yinyi Guo , Erik Visser , Phanidhar Chinchili

"The Intangible Victory", Interactive Audiovisual Installation

"Intangible Victory" is an audiovisual installation in the form of the intangible being of the Victory of Samothrace that uses interactive digital media. Specifically, through this installation, we redefine the visual symbolism of the…

多媒体 · 计算机科学 2026-02-18 Konstantinos Tsioutas , Panagiotis Pangalos , Konstantinos Tiligadis , Andreas Sitorengo

SRA: Semantic Relation-Aware Flowchart Question Answering

Flowchart Question Answering (FlowchartQA) is a multi-modal task that automatically answers questions conditioned on graphic flowcharts. Current studies convert flowcharts into interlanguages (e.g., Graphviz) for Question Answering (QA),…

多媒体 · 计算机科学 2026-02-17 Xinyu Li , Bowei Zou , Yuchong Chen , Yifan Fan , Yu Hong

TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Traceability

Diffusion-based speech generation has achieved remarkable fidelity, increasing the risk of misuse and unauthorized redistribution. However, most existing generative speech watermarking methods are developed for GAN-based pipelines, and…

多媒体 · 计算机科学 2026-02-17 Yue Li , Weizhi Liu , Kaiqing Lin , Dongdong Lin , Kassem Kallas

Rethinking Security of Diffusion-based Generative Steganography

Generative image steganography is a technique that conceals secret messages within generated images, without relying on pre-existing cover images. Recently, a number of diffusion model-based generative image steganography (DM-GIS) methods…

多媒体 · 计算机科学 2026-02-12 Jihao Zhu , Zixuan Chen , Jiali Liu , Lingxiao Yang , Yi Zhou , Weiqi Luo , Xiaohua Xie