Lip reading using external viseme decoding
Computer Vision and Pattern Recognition
2021-11-09 v2
Abstract
Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme, and then converting viseme to character by using separate models. Our proposed method improves word error rate by 4\% compared to the normal sequence to sequence lip-reading model on the BBC-Oxford Lip Reading Sentences 2 (LRS2) dataset.
Cite
@article{arxiv.2104.04784,
title = {Lip reading using external viseme decoding},
author = {Javad Peymanfard and Mohammad Reza Mohammadi and Hossein Zeinali and Nasser Mozayani},
journal= {arXiv preprint arXiv:2104.04784},
year = {2021}
}