Skip to content

ASR评测数据

Librispeech

#词错误率WER

适配方法

CTC Decoder,上游模型输出的特征经过两层LSTM和一个全连接层的线性分类器中。输入维度与特征向量维度相等,输出维度与词典大小相等。

相关论文引用:

@inproceedings{graves2006connectionist, title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks}, author={Graves, Alex and Fern{'a}ndez, Santiago and Gomez, Faustino and Schmidhuber, J{"u}rgen}, booktitle={Proceedings of the 23rd international conference on Machine learning}, pages={369--376}, year={2006} }

数据描述

librispeech是一个大约1000小时的16kHz英语阅读演讲语料库,由vasil Panayotov在Daniel Povey的帮助下制作。数据来源于LibriVox项目的有声读物,并经过仔细分割和整理 。

源数据量

训练数据共960h train-clean-100 train-clean-360 train-other-500

有两组验证集和测试集:dev-clean test-clean ;dev-other test-other

评测数据量:

train-clean-100 100h 28539条; test-dev 5.1h,2939条;test-clean 5.4h , 2620条。

数据字段

Librispeech数据集按音频id存放,例如:19-198-0001.flac音频对应的文本在19-198.trans.txt文件中,字段如下: wav_id text

数据集样例

78-368-0000 CHAPTER TWENTY THREE IT WAS EIGHT O'CLOCK WHEN WE LANDED WE WALKED FOR A SHORT TIME ON THE SHORE ENJOYING THE TRANSITORY LIGHT AND THEN RETIRED TO THE INN

评价指标

词错误率WER

论文引用

@inproceedings{panayotov2015librispeech, title={Librispeech: an asr corpus based on public domain audio books}, author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev}, booktitle={2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)}, pages={5206--5210}, year={2015}, organization={IEEE} }

数据集版权使用说明:

CC BY 4.0

Note:

后续考虑使用Kespeech、Librispeech子集进行测试。