Related papers: Deep Audio-Visual Speech Recognition

Lip Reading Sentences in the Wild

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Joon Son Chung , Andrew Senior , Oriol Vinyals , Andrew Zisserman

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Dalu Feng , Shuang Yang , Shiguang Shan , Xilin Chen

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 K R Prajwal , Triantafyllos Afouras , Andrew Zisserman

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to…

Computer Vision and Pattern Recognition · Computer Science 2018-02-16 M Faisal , Sanaullah Manzoor

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully…

Computer Vision and Pattern Recognition · Computer Science 2018-06-18 Triantafyllos Afouras , Joon Son Chung , Andrew Zisserman

Visual Speech Recognition

Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and…

Computer Vision and Pattern Recognition · Computer Science 2014-09-05 Ahmad B. A. Hassanat

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains…

Computer Vision and Pattern Recognition · Computer Science 2019-11-27 Ya Zhao , Rui Xu , Xinchao Wang , Peng Hou , Haihong Tang , Mingli Song

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

In this project, we worked on speech recognition, specifically predicting individual words based on both the video frames and audio. Empowered by convolutional neural networks, the recent speech recognition and lip reading models are…

Computer Vision and Pattern Recognition · Computer Science 2018-12-27 Devesh Walawalkar , Yihui He , Rohit Pillai

Lip Reading Using Convolutional Auto Encoders as Feature Extractor

Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-06-01 Dharin Parekh , Ankitesh Gupta , Shharrnam Chhatpar , Anmol Yash Kumar , Manasi Kulkarni

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive…

Multimedia · Computer Science 2022-07-13 Hadeel Mabrouk , Omar Abugabal , Nourhan Sakr , Hesham M. Eraqi

Visual Words for Automatic Lip-Reading

Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and…

Computer Vision and Pattern Recognition · Computer Science 2014-09-24 Ahmad Basheer Hassanat

Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization

Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip…

Artificial Intelligence · Computer Science 2024-05-03 Linzhi Wu , Xingyu Zhang , Yakun Zhang , Changyan Zheng , Tiejun Liu , Liang Xie , Ye Yan , Erwei Yin

Visual Speech Recognition for Multiple Languages in the Wild

Visual speech recognition (VSR) aims to recognize the content of speech based on lip movements, without relying on the audio stream. Advances in deep learning and the availability of large audio-visual datasets have led to the development…

Computer Vision and Pattern Recognition · Computer Science 2022-11-01 Pingchuan Ma , Stavros Petridis , Maja Pantic

LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 David Gimeno-Gómez , Carlos-D. Martínez-Hinarejos

LipNet: End-to-End Sentence-level Lipreading

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are…

Machine Learning · Computer Science 2016-12-19 Yannis M. Assael , Brendan Shillingford , Shimon Whiteson , Nando de Freitas

Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion

Today's Automatic Speech Recognition systems only rely on acoustic signals and often don't perform well under noisy conditions. Performing multi-modal speech recognition - processing acoustic speech signals and lip-reading video…

Computer Vision and Pattern Recognition · Computer Science 2018-03-14 Matthijs Van keirsbilck , Bert Moons , Marian Verhelst

Advances and Challenges in Deep Lip Reading

Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition…

Computer Vision and Pattern Recognition · Computer Science 2021-10-18 Marzieh Oghbaie , Arian Sabaghi , Kooshan Hashemifard , Mohammad Akbari

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Visual speech recognition remains an open research problem where different challenges must be considered by dispensing with the auditory sense, such as visual ambiguities, the inter-personal variability among speakers, and the complex…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 David Gimeno-Gómez , Carlos-D. Martínez-Hinarejos

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Visual speech recognition (VSR), commonly known as lip reading, has garnered significant attention due to its wide-ranging practical applications. The advent of deep learning techniques and advancements in hardware capabilities have…

Computer Vision and Pattern Recognition · Computer Science 2025-01-09 Bowen Hao , Dongliang Zhou , Xiaojie Li , Xingyu Zhang , Liang Xie , Jianlong Wu , Erwei Yin

Analysis of Visual Features for Continuous Lipreading in Spanish

During a conversation, our brain is responsible for combining information obtained from multiple senses in order to improve our ability to understand the message we are perceiving. Different studies have shown the importance of presenting…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 David Gimeno-Gómez , Carlos-D. Martínez-Hinarejos