|Description (include details on usage, files and paper references)||The VLRF database is designed with the aim to contribute to research in visual only speech recognition. A key difference of the VLRF database with respect to existing corpora is that it has been designed from a novel point of view: instead of trying to lip-read from people who are speaking naturally (normal speed, normal intonation,...), we propose to lip-read from people who strive to be understood.
We recruited 24 adult volunteers (3 male and 21 female). Each participant was asked to read 25 different sentences, from a total pool of 500 sentences. Each sentence contains between 3 and 12 words, with an average duration of 7 seconds per sentence and a total database duration of 180 minutes (540,162 frames). The sentences were unrelated between them to avoid that lip-readers could benefit from conversation context. The camera recorded a close-up shot at 50 fps with a resolution of 1280x720 pixels and audio at 48 kHz mono with 16-bit resolution.
The database is freely available for research purposes. It includes the following: a) the audio-visual recordings; b) the text of the uttered sentences; c) the phonetic transcription of the uttered sentences. To obtain a copy of the database, please download the License Agreement listed below and send a signed copy to the following e-mail: email@example.com (vlrf dot database at upf dot edu).
For additional information, please refer to the following publication:
A. Fernandez-Lopez, O. Martinez and F.M. Sukno. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, 2017.