Related papers: Lipreading using Temporal Convolutional Networks

Training Strategies for Improved Lip-reading

Several training strategies and temporal models have been recently proposed for isolated word lip-reading in a series of independent works. However, the potential of combining the best strategies and investigating the impact of each of them…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Pingchuan Ma , Yujiang Wang , Stavros Petridis , Jie Shen , Maja Pantic

Lip-reading with Densely Connected Temporal Convolutional Networks

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have recently demonstrated great potential in many vision tasks, its…

Computer Vision and Pattern Recognition · Computer Science 2022-09-30 Pingchuan Ma , Yujiang Wang , Jie Shen , Stavros Petridis , Maja Pantic

Lip Reading Using Convolutional Auto Encoders as Feature Extractor

Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal…

Computer Vision and Pattern Recognition · Computer Science 2018-06-01 Dharin Parekh , Ankitesh Gupta , Shharrnam Chhatpar , Anmol Yash Kumar , Manasi Kulkarni

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully…

Computer Vision and Pattern Recognition · Computer Science 2018-06-18 Triantafyllos Afouras , Joon Son Chung , Andrew Zisserman

Can DNNs Learn to Lipread Full Sentences?

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on…

Image and Video Processing · Electrical Eng. & Systems 2018-05-31 George Sterpu , Christian Saam , Naomi Harte

Combining Residual Networks with LSTMs for Lipreading

We propose an end-to-end deep learning architecture for word-level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We train and evaluate it…

Computer Vision and Pattern Recognition · Computer Science 2017-09-11 Themos Stafylakis , Georgios Tzimiropoulos

Lipreading with Long Short-Term Memory

Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward…

Computer Vision and Pattern Recognition · Computer Science 2016-02-01 Michael Wand , Jan Koutník , Jürgen Schmidhuber

Towards Practical Lipreading with Distilled and Efficient Models

Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. However, there…

Computer Vision and Pattern Recognition · Computer Science 2021-06-03 Pingchuan Ma , Brais Martinez , Stavros Petridis , Maja Pantic

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading. On the one hand, we first introduce the multi-scale processing into the spatial feature extraction for lip-reading. Specially, we proposed…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Hang Chen , Jun Du , Yu Hu , Li-Rong Dai , Chin-Hui Lee , Bao-Cai Yin

LipNet: End-to-End Sentence-level Lipreading

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are…

Machine Learning · Computer Science 2016-12-19 Yannis M. Assael , Brendan Shillingford , Shimon Whiteson , Nando de Freitas

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Dalu Feng , Shuang Yang , Shiguang Shan , Xilin Chen

Word-level Lexical Normalisation using Context-Dependent Embeddings

Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while…

Computation and Language · Computer Science 2019-11-15 Michael Stewart , Wei Liu , Rachel Cardell-Oliver

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

In recent decades, neural network based methods have significantly improved the performace of speech enhancement. Most of them estimate time-frequency (T-F) representation of target speech directly or indirectly, then resynthesize waveform…

Sound · Computer Science 2020-02-06 Jingdong Li , Hui Zhang , Xueliang Zhang , Changliang Li

Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs

Visual and audiovisual speech recognition are witnessing a renaissance which is largely due to the advent of deep learning methods. In this paper, we present a deep learning architecture for lipreading and audiovisual word recognition,…

Computer Vision and Pattern Recognition · Computer Science 2018-11-06 Themos Stafylakis , Muhammad Haris Khan , Georgios Tzimiropoulos

Deep Audio-Visual Speech Recognition

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an…

Computer Vision and Pattern Recognition · Computer Science 2018-12-27 Triantafyllos Afouras , Joon Son Chung , Andrew Senior , Oriol Vinyals , Andrew Zisserman

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to…

Computer Vision and Pattern Recognition · Computer Science 2018-02-16 M Faisal , Sanaullah Manzoor

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress in some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Shuang Yang , Yuanhang Zhang , Dalu Feng , Mingmin Yang , Chenhao Wang , Jingyun Xiao , Keyu Long , Shiguang Shan , Xilin Chen

Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading

Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Shahd Elashmawy , Marian Ramsis , Hesham M. Eraqi , Farah Eldeshnawy , Hadeel Mabrouk , Omar Abugabal , Nourhan Sakr

End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition

Lip reading or visual speech recognition has gained significant attention in recent years, particularly because of hardware development and innovations in computer vision. While considerable progress has been obtained, most models have only…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Emilian-Claudiu Mănescu , Răzvan-Alexandru Smădu , Andrei-Marius Avram , Dumitru-Clementin Cercel , Florin Pop

TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading

The word-level lipreading approach typically employs a two-stage framework with separate frontend and backend architectures to model dynamic lip movements. Each component has been extensively studied, and in the backend architecture,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Byung Hoon Lee , Wooseok Shin , Sung Won Han