Computer Science
Deciding periodicity of infinite words generated by morphisms is a classical result in combinatorics on words from 80's by Harju, Linna and Pansiot. In this paper, we are interested in this question in the abelian setting. Two words are…
While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing…
Conversational multimodal emotion recognition (MER) requires reliable prediction when language, acoustic, or visual observations are missing or unreliable. Many missing-modality methods reconstruct absent inputs, yet such recovery can be…
Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the minimum Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. A \emph{split…
Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited.…
The application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion…
Given a finite and non-empty set $X$ and randomly selected specific functions and relations on $X$, we investigate the existence and non-existence of fixed points and reflexive points, respectively. First, we consider the class of…
We consider $d$-dimensional configurations, that is, colorings of the $d$-dimensional integer grid $\mathbb{Z}^d$ with finitely many colors. Moreover, we interpret the colors as integers so that configurations are functions $\mathbb{Z}^d…
The class of intersection bigraphs of unit intervals of the real line whose ends may be open or closed is called a class of mixed unit interval bigraphs. This class of bigraphs is a strict superclass of the class of unit interval bigraphs.…
Traditional RGB-based speech generation faces Temporal Granularity Mismatch since fixed camera exposure times inevitably blur the high-frequency articulatory transients essential for rendering emotional speech. To break this ceiling, we…
This companion paper provides artifacts and instructions on replicating the experiments in the ACM Multimedia 2024 paper entitled "Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks." Swarm-based hierarchical,…
We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no…
The ferromagnetic Ising model on an $n\times n$ square lattice region $\Lambda$ with mixed boundary conditions can exhibit a phase transition as temperature varies. For this spin system, if we fix the spins on the top and bottom sides of…
We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models…
A class of graphs $\mathcal{C}$ is closed under powers if for every graph $G\in\mathcal{C}$ and every $k\in\mathbb{N}$, $G^k\in\mathcal{C}$. Also $\mathcal{C}$ is strongly closed under powers if for every $k\in\mathbb{N}$, if…
This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow…
Swarical, a Swarm-based hierarchical localization technique, enables miniature drones, known as Flying Light Specks (FLSs), to accurately and efficiently localize and illuminate complex 2D and 3D shapes. Its accuracy depends on the physical…
We study the problem of recovering the relative positions of objects moving along the real line based only on pairwise collision data. While interaction-based sensing systems arise naturally in a variety of practical settings, a systematic…
A dominating set $S$ of a graph $G(V,E)$ is called a \textit{secure dominating set} if each vertex $u \in V(G) \setminus S$ is adjacent to a vertex $v \in S$ such that $(S \setminus \{v\}) \cup \{u\}$ is a dominating set of $G$. The…
Multimodal foundation models have demonstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and…