Yet Another Computer Vision Index To Datasets (YACVID) - Details

Stand: 2020-06-01 000000m 16:06:30 - Overview

Attribute Current Content New
Name (Institute + Shorttitle)Visual Lip Reading Feasibility (VRLF)  
Description (include details on usage, files and paper references)The VLRF database is designed with the aim to contribute to research in visual only speech recognition. A key difference of the VLRF database with respect to existing corpora is that it has been designed from a novel point of view: instead of trying to lip-read from people who are speaking naturally (normal speed, normal intonation,...), we propose to lip-read from people who strive to be understood.

We recruited 24 adult volunteers (3 male and 21 female). Each participant was asked to read 25 different sentences, from a total pool of 500 sentences. Each sentence contains between 3 and 12 words, with an average duration of 7 seconds per sentence and a total database duration of 180 minutes (540,162 frames). The sentences were unrelated between them to avoid that lip-readers could benefit from conversation context. The camera recorded a close-up shot at 50 fps with a resolution of 1280x720 pixels and audio at 48 kHz mono with 16-bit resolution.

The database is freely available for research purposes. It includes the following: a) the audio-visual recordings; b) the text of the uttered sentences; c) the phonetic transcription of the uttered sentences. To obtain a copy of the database, please download the License Agreement listed below and send a signed copy to the following e-mail: (vlrf dot database at upf dot edu).

For additional information, please refer to the following publication:

A. Fernandez-Lopez, O. Martinez and F.M. Sukno. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, 2017. 
URL Link 
Files (#)540162 
References (SKIPPED)
Category (SKIPPED) 
Tags (single words, spaced)lip reading recognition speaker spanish language mouth face speech 
Last Changed2020-06-01 
Turing (2.12+3.25=?) :-)