English
Related papers

Related papers: Synchformer: Efficient Synchronization from Sparse…

200 papers

The objective of this paper is audio-visual synchronisation of general videos 'in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Vladimir Iashin , Weidi Xie , Esa Rahtu , Andrew Zisserman

In this paper, we consider the problem of audio-visual synchronisation applied to videos `in-the-wild' (ie of general classes beyond speech). As a new task, we identify and curate a test set with high audio-visual correlation, namely…

Computer Vision and Pattern Recognition · Computer Science 2021-12-09 Honglie Chen , Weidi Xie , Triantafyllos Afouras , Arsha Nagrani , Andrea Vedaldi , Andrew Zisserman

Text-to-video and image-to-video generation have made rapid progress in visual quality, but they remain limited in controlling the precise timing of motion. In contrast, audio provides temporal cues aligned with video motion, making it a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Jibin Song , Mingi Kwon , Jaeseok Jeong , Youngjung Uh

Recent advances in audio-synchronized visual animation enable control of video content using audios from specific classes. However, existing methods rely heavily on expensive manual curation of high-quality, class-specific training videos,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Lin Zhang , Zefan Cai , Yufan Zhou , Shentong Mo , Jinhong Lin , Cheng-En Wu , Yibing Wei , Yijing Zhang , Ruiyi Zhang , Wen Xiao , Tong Sun , Junjie Hu , Pedro Morgado

Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. Existing methods assume access to paired training data, where the audio is observed in both source and target…

Multimedia · Computer Science 2023-11-27 Arjun Somayazulu , Changan Chen , Kristen Grauman

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Changan Chen , Ruohan Gao , Paul Calamia , Kristen Grauman

There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-11-13 Bruno Korbar , Du Tran , Lorenzo Torresani

We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous…

Sound · Computer Science 2021-10-15 Efthymios Tzinis , Scott Wisdom , Tal Remez , John R. Hershey

Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Taegyeong Lee , Jeonghun Kang , Hyeonyu Kim , Taehwan Kim

Audio-visual feature synchronization for real-time speech enhancement in hearing aids represents a progressive approach to improving speech intelligibility and user experience, particularly in strong noisy backgrounds. This approach…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Nasir Saleem , Mandar Gogate , Kia Dashtipour , Adeel Hussain , Usman Anwar , Adewale Adetomi , Tughrul Arslan , Amir Hussain

We present a method for joint alignment of sparse in-the-wild image collections of an object category. Most prior works assume either ground-truth keypoint annotations or a large dataset of images of a single object category. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Kamal Gupta , Varun Jampani , Carlos Esteves , Abhinav Shrivastava , Ameesh Makadia , Noah Snavely , Abhishek Kar

In many applications, synchronizing audio with visuals is crucial, such as in creating graphic animations for films or games, translating movie audio into different languages, and developing metaverse applications. This review explores…

Performance-score synchronization is an integral task in signal processing, which entails generating an accurate mapping between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods…

Sound · Computer Science 2022-04-20 Ruchit Agrawal , Daniel Wolff , Simon Dixon

Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present…

Multi-view capture systems have been an important tool in research for recording human motion under controlling conditions. Most existing systems are specified around video streams and provide little or no support for audio acquisition and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Xiangwei Shi , Gara Dorta , Ruud de Jong , Ojas Shirekar , Chirag Raman

The synthesis of synchronized audio-visual content is a key challenge in generative AI, with open-source models facing challenges in robust audio-video alignment. Our analysis reveals that this issue is rooted in three fundamental…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Teng Hu , Zhentao Yu , Guozhen Zhang , Zihan Su , Zhengguang Zhou , Youliang Zhang , Yuan Zhou , Qinglin Lu , Ran Yi

Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Tingle Li , Renhao Wang , Po-Yao Huang , Andrew Owens , Gopala Anumanchipalli

Video synchronization-aligning multiple video streams capturing the same event from different angles-is crucial for applications such as reality TV show production, sports analysis, surveillance, and autonomous systems. Prior work has…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Yosub Shin , Igor Molybog

Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally…

Computer Vision and Pattern Recognition · Computer Science 2024-04-12 Lucas Goncalves , Prashant Mathur , Chandrashekhar Lavania , Metehan Cekic , Marcello Federico , Kyu J. Han

Humans have the ability to utilize visual cues, such as lip movements and visual scenes, to enhance auditory perception, particularly in noisy environments. However, current Automatic Speech Recognition (ASR) or Audio-Visual Speech…

Computation and Language · Computer Science 2025-04-11 Lakshmipathi Balaji , Karan Singla
‹ Prev 1 2 3 10 Next ›