Related papers: Modality Dropout for Improved Performance-driven T…

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances…

Sound · Computer Science 2024-03-08 Yusheng Dai , Hang Chen , Jun Du , Ruoyu Wang , Shihao Chen , Jiefeng Ma , Haotian Wang , Chin-Hui Lee

DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Guinan Su , Yanwu Yang , Zhifeng Li

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model (EAMM)…

Computer Vision and Pattern Recognition · Computer Science 2022-09-26 Xinya Ji , Hang Zhou , Kaisiyuan Wang , Qianyi Wu , Wayne Wu , Feng Xu , Xun Cao

Expressive Speech-driven Facial Animation with controllable emotions

It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Yutong Chen , Junhong Zhao , Wei-Qiang Zhang

Training Strategies to Handle Missing Modalities for Audio-Visual Expression Recognition

Automatic audio-visual expression recognition can play an important role in communication services such as tele-health, VOIP calls and human-machine interaction. Accuracy of audio-visual expression recognition could benefit from the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-02 Srinivas Parthasarathy , Shiva Sundaram

An Implicit Physical Face Model Driven by Expression and Style

3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Lingchen Yang , Gaspard Zoss , Prashanth Chandran , Paulo Gotardo , Markus Gross , Barbara Solenthaler , Eftychios Sifakis , Derek Bradley

Content and Style Aware Audio-Driven Facial Animation

Audio-driven 3D facial animation has several virtual humans applications for content creation and editing. While several existing methods provide solutions for speech-driven animation, precise control over content (what) and style (how) of…

Sound · Computer Science 2024-08-15 Qingju Liu , Hyeongwoo Kim , Gaurav Bharaj

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Given an arbitrary audio clip, audio-driven 3D facial animation aims to generate lifelike lip motions and facial expressions for a 3D head. Existing methods typically rely on training their models using limited public 3D datasets that…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Liying Lu , Tianke Zhang , Yunfei Liu , Xuangeng Chu , Yu Li

Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities

The study of human emotions, traditionally a cornerstone in fields like psychology and neuroscience, has been profoundly impacted by the advent of artificial intelligence (AI). Multiple channels, such as speech (voice) and facial…

Machine Learning · Computer Science 2024-04-19 Luciana Trinkaus Menon , Luiz Carlos Ribeiro Neduziak , Jean Paul Barddal , Alessandro Lameiras Koerich , Alceu de Souza Britto

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and…

Computer Vision and Pattern Recognition · Computer Science 2017-12-11 Hai X. Pham , Yuting Wang , Vladimir Pavlovic

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert

Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Han EunGi , Oh Hyun-Bin , Kim Sung-Bin , Corentin Nivelet Etcheberry , Suekyeong Nam , Janghoon Joo , Tae-Hyun Oh

Animating Face using Disentangled Audio Representations

All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or…

Computer Vision and Pattern Recognition · Computer Science 2019-10-03 Gaurav Mittal , Baoyuan Wang

Leveraging Recent Advances in Deep Learning for Audio-Visual Emotion Recognition

Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They are expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Liam Schoneveld , Alice Othmani , Hazem Abdelkawy

MakeItTalk: Speaker-Aware Talking-Head Animation

We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating…

Computer Vision and Pattern Recognition · Computer Science 2021-02-26 Yang Zhou , Xintong Han , Eli Shechtman , Jose Echevarria , Evangelos Kalogerakis , Dingzeyu Li

Gaze Training by Modulated Dropout Improves Imitation Learning

Imitation learning by behavioral cloning is a prevalent method that has achieved some success in vision-based autonomous driving. The basic idea behind behavioral cloning is to have the neural network learn from observing a human expert's…

Computer Vision and Pattern Recognition · Computer Science 2019-08-19 Yuying Chen , Congcong Liu , Lei Tai , Ming Liu , Bertram E. Shi

Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos

With the assumption that a video dataset is multimodality annotated in which auditory and visual modalities both are labeled or class-relevant, current multimodal methods apply modality fusion or cross-modality attention. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Saghir Alfasly , Jian Lu , Chen Xu , Yuru Zou

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g…

Sound · Computer Science 2023-10-25 Gautam Krishna , Sameer Dharur , Oggi Rudovic , Pranay Dighe , Saurabh Adya , Ahmed Hussen Abdelaziz , Ahmed H Tewfik

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Radek Daněček , Kiran Chhatre , Shashank Tripathi , Yandong Wen , Michael J. Black , Timo Bolkart

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Visual emotion expression plays an important role in audiovisual speech communication. In this work, we propose a novel approach to rendering visual emotion expression in speech-driven talking face generation. Specifically, we design an…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-23 Sefik Emre Eskimez , You Zhang , Zhiyao Duan

Self-attention fusion for audiovisual emotion recognition with incomplete data

In this paper, we consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Kateryna Chumachenko , Alexandros Iosifidis , Moncef Gabbouj