English

Lip reading using external viseme decoding

Computer Vision and Pattern Recognition 2021-11-09 v2

Abstract

Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme, and then converting viseme to character by using separate models. Our proposed method improves word error rate by 4\% compared to the normal sequence to sequence lip-reading model on the BBC-Oxford Lip Reading Sentences 2 (LRS2) dataset.

Keywords

Cite

@article{arxiv.2104.04784,
  title  = {Lip reading using external viseme decoding},
  author = {Javad Peymanfard and Mohammad Reza Mohammadi and Hossein Zeinali and Nasser Mozayani},
  journal= {arXiv preprint arXiv:2104.04784},
  year   = {2021}
}
R2 v1 2026-06-24T01:02:14.129Z