English
Related papers

Related papers: Modality Dropout for Improved Performance-driven T…

200 papers

Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances…

Sound · Computer Science 2024-03-08 Yusheng Dai , Hang Chen , Jun Du , Ruoyu Wang , Shihao Chen , Jiefeng Ma , Haotian Wang , Chin-Hui Lee

In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Guinan Su , Yanwu Yang , Zhifeng Li

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model (EAMM)…

Computer Vision and Pattern Recognition · Computer Science 2022-09-26 Xinya Ji , Hang Zhou , Kaisiyuan Wang , Qianyi Wu , Wayne Wu , Feng Xu , Xun Cao

It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Yutong Chen , Junhong Zhao , Wei-Qiang Zhang

Automatic audio-visual expression recognition can play an important role in communication services such as tele-health, VOIP calls and human-machine interaction. Accuracy of audio-visual expression recognition could benefit from the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-02 Srinivas Parthasarathy , Shiva Sundaram

3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Lingchen Yang , Gaspard Zoss , Prashanth Chandran , Paulo Gotardo , Markus Gross , Barbara Solenthaler , Eftychios Sifakis , Derek Bradley

Audio-driven 3D facial animation has several virtual humans applications for content creation and editing. While several existing methods provide solutions for speech-driven animation, precise control over content (what) and style (how) of…

Sound · Computer Science 2024-08-15 Qingju Liu , Hyeongwoo Kim , Gaurav Bharaj

Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head. Existing methods typically rely on training their models using limited public 3D datasets that…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Liying Lu , Tianke Zhang , Yunfei Liu , Xuangeng Chu , Yu Li

The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial…

We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and…

Computer Vision and Pattern Recognition · Computer Science 2017-12-11 Hai X. Pham , Yuting Wang , Vladimir Pavlovic

Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Han EunGi , Oh Hyun-Bin , Kim Sung-Bin , Corentin Nivelet Etcheberry , Suekyeong Nam , Janghoon Joo , Tae-Hyun Oh

All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or…

Computer Vision and Pattern Recognition · Computer Science 2019-10-03 Gaurav Mittal , Baoyuan Wang

Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They are expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Liam Schoneveld , Alice Othmani , Hazem Abdelkawy

We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating…

Computer Vision and Pattern Recognition · Computer Science 2021-02-26 Yang Zhou , Xintong Han , Eli Shechtman , Jose Echevarria , Evangelos Kalogerakis , Dingzeyu Li

Imitation learning by behavioral cloning is a prevalent method that has achieved some success in vision-based autonomous driving. The basic idea behind behavioral cloning is to have the neural network learn from observing a human expert's…

Computer Vision and Pattern Recognition · Computer Science 2019-08-19 Yuying Chen , Congcong Liu , Lei Tai , Ming Liu , Bertram E. Shi

With the assumption that a video dataset is multimodality annotated in which auditory and visual modalities both are labeled or class-relevant, current multimodal methods apply modality fusion or cross-modality attention. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Saghir Alfasly , Jian Lu , Chen Xu , Yuru Zou

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g…

To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Radek Daněček , Kiran Chhatre , Shashank Tripathi , Yandong Wen , Michael J. Black , Timo Bolkart

Visual emotion expression plays an important role in audiovisual speech communication. In this work, we propose a novel approach to rendering visual emotion expression in speech-driven talking face generation. Specifically, we design an…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-23 Sefik Emre Eskimez , You Zhang , Zhiyao Duan

In this paper, we consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Kateryna Chumachenko , Alexandros Iosifidis , Moncef Gabbouj
‹ Prev 1 2 3 10 Next ›