English
Related papers

Related papers: Large-Scale Visual Speech Recognition

200 papers

Lip reading is a challenging task that has many potential applications in speech recognition, human-computer interaction, and security systems. However, existing lip reading systems often suffer from low accuracy due to the limitations of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Javad Peymanfard , Vahid Saeedi , Mohammad Reza Mohammadi , Hossein Zeinali , Nasser Mozayani

Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of…

Computer Vision and Pattern Recognition · Computer Science 2021-09-15 Evgeniy Egorov , Vasily Kostyumov , Mikhail Konyk , Sergey Kolesnikov

Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics. There have been several appealing progress in recent years, benefiting much from the rapidly developed…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Dalu Feng , Shuang Yang , Shiguang Shan , Xilin Chen

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques…

Computer Vision and Pattern Recognition · Computer Science 2021-12-06 K R Prajwal , Triantafyllos Afouras , Andrew Zisserman

Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on increasingly large amounts of video data, while the publicly available transcribed video datasets are limited in size. In this paper, for the first…

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an…

Computer Vision and Pattern Recognition · Computer Science 2020-11-05 Joon Son Chung , Andrew Senior , Oriol Vinyals , Andrew Zisserman

Lip Reading, or Visual Automatic Speech Recognition (V-ASR), is a complex task requiring the interpretation of spoken language exclusively from visual cues, primarily lip movements and facial expressions. This task is especially challenging…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Marshall Thomas , Edward Fish , Richard Bowden

Visual speech recognition (VSR), commonly known as lip reading, has garnered significant attention due to its wide-ranging practical applications. The advent of deep learning techniques and advancements in hardware capabilities have…

Computer Vision and Pattern Recognition · Computer Science 2025-01-09 Bowen Hao , Dongliang Zhou , Xiaojie Li , Xingyu Zhang , Liang Xie , Jianlong Wu , Erwei Yin

This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved…

Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information…

Computer Vision and Pattern Recognition · Computer Science 2017-04-27 Adriana Fernandez-Lopez , Oriol Martinez , Federico M. Sukno

Large-scale datasets have successively proven their fundamental importance in several research fields, especially for early progress in some emerging topics. In this paper, we focus on the problem of visual speech recognition, also known as…

Computer Vision and Pattern Recognition · Computer Science 2019-04-25 Shuang Yang , Yuanhang Zhang , Dalu Feng , Mingmin Yang , Chenhao Wang , Jingyun Xiao , Keyu Long , Shiguang Shan , Xilin Chen

Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward…

Computer Vision and Pattern Recognition · Computer Science 2016-02-01 Michael Wand , Jan Koutník , Jürgen Schmidhuber

Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Javad Peymanfard , Ali Lashini , Samin Heydarian , Hossein Zeinali , Nasser Mozayani

In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive…

Multimedia · Computer Science 2022-07-13 Hadeel Mabrouk , Omar Abugabal , Nourhan Sakr , Hesham M. Eraqi

Nowadays, non-privacy small-scale motion detection has attracted an increasing amount of research in remote sensing in speech recognition. These new modalities are employed to enhance and restore speech information from speakers of multiple…

Signal Processing · Electrical Eng. & Systems 2023-03-16 Yao Ge , Chong Tang , Haobo Li , Zikang Zhang , Wenda Li , Kevin Chetty , Daniele Faccio , Qammer H. Abbasi , Muhammad Imran

Lip-to-speech involves generating a natural-sounding speech synchronized with a soundless video of a person talking. Despite recent advances, current methods still cannot produce high-quality speech with high levels of intelligibility for…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-29 Yochai Yemini , Aviv Shamsian , Lior Bracha , Sharon Gannot , Ethan Fetaya

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while…

Computer Vision and Pattern Recognition · Computer Science 2017-11-01 Themos Stafylakis , Georgios Tzimiropoulos

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition. Most studies with Transformers…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-13 Liang Lu , Changliang Liu , Jinyu Li , Yifan Gong

Lipreading is understanding speech from observed lip movements. An observed series of lip motions is an ordered sequence of visual lip gestures. These gestures are commonly known, but as yet are not formally defined, as `visemes'. In this…

Image and Video Processing · Electrical Eng. & Systems 2019-09-17 Helen Bear , Richard Harvey

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are…

Machine Learning · Computer Science 2016-12-19 Yannis M. Assael , Brendan Shillingford , Shimon Whiteson , Nando de Freitas
‹ Prev 1 2 3 10 Next ›