Related papers: Large-Scale Visual Speech Recognition

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

Lip reading is a challenging task that has many potential applications in speech recognition, human-computer interaction, and security systems. However, existing lip reading systems often suffer from low accuracy due to the limitations of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Javad Peymanfard , Vahid Saeedi , Mohammad Reza Mohammadi , Hossein Zeinali , Nasser Mozayani

LRWR: Large-Scale Benchmark for Lip Reading in Russian language

Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Evgeniy Egorov , Vasily Kostyumov , Mikhail Konyk , Sergey Kolesnikov

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Dalu Feng , Shuang Yang , Shiguang Shan , Xilin Chen

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 K R Prajwal , Triantafyllos Afouras , Andrew Zisserman

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Xubo Liu , Egor Lakomkin , Konstantinos Vougioukas , Pingchuan Ma , Honglie Chen , Ruiming Xie , Morrie Doulaty , Niko Moritz , Jáchym Kolář , Stavros Petridis , Maja Pantic , Christian Fuegen

Lip Reading Sentences in the Wild

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Joon Son Chung , Andrew Senior , Oriol Vinyals , Andrew Zisserman

VALLR: Visual ASR Language Model for Lip Reading

Lip Reading, or Visual Automatic Speech Recognition (V-ASR), is a complex task requiring the interpretation of spoken language exclusively from visual cues, primarily lip movements and facial expressions. This task is especially challenging…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Marshall Thomas , Edward Fish , Richard Bowden

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Visual speech recognition (VSR), commonly known as lip reading, has garnered significant attention due to its wide-ranging practical applications. The advent of deep learning techniques and advancements in hardware capabilities have…

Computer Vision and Pattern Recognition · Computer Science 2025-01-09 Bowen Hao , Dongliang Zhou , Xiaojie Li , Xingyu Zhang , Liang Xie , Jianlong Wu , Erwei Yin

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-12 Takuya Yoshioka , Igor Abramovski , Cem Aksoylar , Zhuo Chen , Moshe David , Dimitrios Dimitriadis , Yifan Gong , Ilya Gurvich , Xuedong Huang , Yan Huang , Aviv Hurvitz , Li Jiang , Sharon Koubi , Eyal Krupka , Ido Leichter , Changliang Liu , Partha Parthasarathy , Alon Vinnikov , Lingfeng Wu , Xiong Xiao , Wayne Xiong , Huaming Wang , Zhenghao Wang , Jun Zhang , Yong Zhao , Tianyan Zhou

Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database

Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information…

Computer Vision and Pattern Recognition · Computer Science 2017-04-27 Adriana Fernandez-Lopez , Oriol Martinez , Federico M. Sukno

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress in some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Shuang Yang , Yuanhang Zhang , Dalu Feng , Mingmin Yang , Chenhao Wang , Jingyun Xiao , Keyu Long , Shiguang Shan , Xilin Chen

Lipreading with Long Short-Term Memory

Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward…

Computer Vision and Pattern Recognition · Computer Science 2016-02-01 Michael Wand , Jan Koutník , Jürgen Schmidhuber

Word-level Persian Lipreading Dataset

Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Javad Peymanfard , Ali Lashini , Samin Heydarian , Hossein Zeinali , Nasser Mozayani

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive…

Multimedia · Computer Science 2022-07-13 Hadeel Mabrouk , Omar Abugabal , Nourhan Sakr , Hesham M. Eraqi

A large-scale multimodal dataset of human speech recognition

Nowadays, non-privacy small-scale motion detection has attracted an increasing amount of research in remote sensing in speech recognition. These new modalities are employed to enhance and restore speech information from speakers of multiple…

Signal Processing · Electrical Eng. & Systems 2023-03-16 Yao Ge , Chong Tang , Haobo Li , Zikang Zhang , Wenda Li , Kevin Chetty , Daniele Faccio , Qammer H. Abbasi , Muhammad Imran

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Lip-to-speech involves generating a natural-sounding speech synchronized with a soundless video of a person talking. Despite recent advances, current methods still cannot produce high-quality speech with high levels of intelligibility for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-29 Yochai Yemini , Aviv Shamsian , Lior Bracha , Sharon Gannot , Ethan Fetaya

Deep word embeddings for visual speech recognition

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while…

Computer Vision and Pattern Recognition · Computer Science 2017-11-01 Themos Stafylakis , Georgios Tzimiropoulos

Exploring Transformers for Large-Scale Speech Recognition

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition. Most studies with Transformers…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-13 Liang Lu , Changliang Liu , Jinyu Li , Yifan Gong

Alternative Visual Units for an Optimized Phoneme-Based Lipreading System

Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as `visemes'. In this…

Image and Video Processing · Electrical Eng. & Systems 2019-09-17 Helen Bear , Richard Harvey

LipNet: End-to-End Sentence-level Lipreading

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are…

Machine Learning · Computer Science 2016-12-19 Yannis M. Assael , Brendan Shillingford , Shimon Whiteson , Nando de Freitas