Skip to content

Evaluation Dataset

Librispeech

#Word Error Rate (WER)

Adapter

CTC Decoder

Data description:

LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.

Dataset structure:

Amount of source data:

Total training data is 960h including: train-clean-100,train-clean-360, and train-other-500.

There are two sets of validation sets and test sets: dev-clean and test-clean; dev-other and test-other.

Amount of testing data:

train-clean-100 100h, 28539, test-dev 5.1h,2939, test-clean 5.4h, 2620.

Data detail:

The Librispeech data set is stored by audio id. For example: the text corresponding to the 19-198-0001.flac audio is in the 19-198.trans.txt file. The fields are as follows: wav_id text

Sample of source dataset:

78-368-0000 CHAPTER TWENTY THREE IT WAS EIGHT O'CLOCK WHEN WE LANDED WE WALKED FOR A SHORT TIME ON THE SHORE ENJOYING THE TRANSITORY LIGHT AND THEN RETIRED TO THE INN

Citation information:

@inproceedings{panayotov2015librispeech,
title={Librispeech: an asr corpus based on public domain audio books},
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
booktitle={2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
pages={5206--5210},
year={2015},
organization={IEEE}
}

Licensing information:

CC BY 4.0