Related papers: Visual Passwords Using Automatic Lip Reading

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

In low-resource computing contexts, such as smartphones and other tiny devices, Both deep learning and machine learning are being used in a lot of identification systems. as authentication techniques. The transparent, contactless, and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Pangoth Santhosh Kumar , Garika Akshay

Visual Speech Recognition

Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and…

Computer Vision and Pattern Recognition · Computer Science 2014-09-05 Ahmad B. A. Hassanat

VALLR: Visual ASR Language Model for Lip Reading

Lip Reading, or Visual Automatic Speech Recognition (V-ASR), is a complex task requiring the interpretation of spoken language exclusively from visual cues, primarily lip movements and facial expressions. This task is especially challenging…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Marshall Thomas , Edward Fish , Richard Bowden

Visual Words for Automatic Lip-Reading

Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and…

Computer Vision and Pattern Recognition · Computer Science 2014-09-24 Ahmad Basheer Hassanat

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 K R Prajwal , Triantafyllos Afouras , Andrew Zisserman

Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database

Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information…

Computer Vision and Pattern Recognition · Computer Science 2017-04-27 Adriana Fernandez-Lopez , Oriol Martinez , Federico M. Sukno

Large-Scale Visual Speech Recognition

This work presents a scalable solution to open-vocabulary visual speech recognition. To achieve this, we constructed the largest existing visual speech recognition dataset, consisting of pairs of text and video clips of faces speaking…

Computer Vision and Pattern Recognition · Computer Science 2018-10-02 Brendan Shillingford , Yannis Assael , Matthew W. Hoffman , Thomas Paine , Cían Hughes , Utsav Prabhu , Hank Liao , Hasim Sak , Kanishka Rao , Lorrayne Bennett , Marie Mulville , Ben Coppin , Ben Laurie , Andrew Senior , Nando de Freitas

Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction

Visual Automatic Speech Recognition (V-ASR) is a challenging task that involves interpreting spoken language solely from visual information, such as lip movements and facial expressions. This task is notably challenging due to the absence…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Matthew Kit Khinn Teng , Haibo Zhang , Takeshi Saitoh

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

Lip reading is a challenging task that has many potential applications in speech recognition, human-computer interaction, and security systems. However, existing lip reading systems often suffer from low accuracy due to the limitations of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Javad Peymanfard , Vahid Saeedi , Mohammad Reza Mohammadi , Hossein Zeinali , Nasser Mozayani

Advances and Challenges in Deep Lip Reading

Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition…

Computer Vision and Pattern Recognition · Computer Science 2021-10-18 Marzieh Oghbaie , Arian Sabaghi , Kooshan Hashemifard , Mohammad Akbari

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Praneeth Nemani , G. Sai Krishna , Supriya Kundrapu

Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition

We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pipeline first performs face recognition to identify…

Computer Vision and Pattern Recognition · Computer Science 2026-01-13 Kuan Wei Chen , Ting Yi Lin , Wen Ren Yang , Aryan Kesarwani , Riya Singh

VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis

Realistic, high-fidelity 3D facial animations are crucial for expressive avatar systems in human-computer interaction and accessibility. Although prior methods show promising quality, their reliance on the mesh domain limits their ability…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Alexandre Symeonidis-Herzig , Özge Mercanoğlu Sincan , Richard Bowden

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Dalu Feng , Shuang Yang , Shiguang Shan , Xilin Chen

Active Voice Authentication

Active authentication refers to a new mode of identity verification in which biometric indicators are continuously tested to provide real-time or near real-time monitoring of an authorized access to a service or use of a device. This is in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-28 Zhong Meng , M Umair Bin Altaf , Biing-Hwang , Juang

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements. Some works have been recently presented which use audio signals to supplement visual information. However, existing methods utilize only limited…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Jeong Hun Yeo , Minsu Kim , Yong Man Ro

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to…

Computer Vision and Pattern Recognition · Computer Science 2019-07-19 Kalin Stefanov , Jonas Beskow , Giampiero Salvi

AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements

Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices. These AI-powered facial…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Mohit Raghavendra , Pravan Omprakash , B R Mukesh , Sowmya Kamath

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Jeong Hun Yeo , Minsu Kim , Shinji Watanabe , Yong Man Ro

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new…

Computer Vision and Pattern Recognition · Computer Science 2018-04-26 Helen L. Bear , Stephen J. Cox , Richard W. Harvey