ASR评测数据
Librispeech
#词错误率WER
适配方法
CTC Decoder,上游模型输出的特征经过两层LSTM和一个全连接层的线性分类器中。输入维度与特征向量维度相等,输出维度与词典大小相等。
相关论文引用:
@inproceedings{graves2006connectionist, title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks}, author={Graves, Alex and Fern{'a}ndez, Santiago and Gomez, Faustino and Schmidhuber, J{"u}rgen}, booktitle={Proceedings of the 23rd international conference on Machine learning}, pages={369--376}, year={2006} }
数据描述
librispeech是一个大约1000小时的16kHz英语阅读演讲语料库,由vasil Panayotov在Daniel Povey的帮助下制作。数据来源于LibriVox项目的有声读物,并经过仔细分割和整理 。
源数据量
训练数据共960h train-clean-100 train-clean-360 train-other-500
有两组验证集和测试集:dev-clean test-clean ;dev-other test-other
评测数据量:
train-clean-100 100h 28539条; test-dev 5.1h,2939条;test-clean 5.4h , 2620条。
数据字段
Librispeech数据集按音频id存放,例如:19-198-0001.flac音频对应的文本在19-198.trans.txt文件中,字段如下: wav_id text
数据集样例
78-368-0000 CHAPTER TWENTY THREE IT WAS EIGHT O'CLOCK WHEN WE LANDED WE WALKED FOR A SHORT TIME ON THE SHORE ENJOYING THE TRANSITORY LIGHT AND THEN RETIRED TO THE INN
评价指标
词错误率WER
论文引用
@inproceedings{panayotov2015librispeech, title={Librispeech: an asr corpus based on public domain audio books}, author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev}, booktitle={2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)}, pages={5206--5210}, year={2015}, organization={IEEE} }
数据集版权使用说明:
CC BY 4.0
Note:
后续考虑使用Kespeech、Librispeech子集进行测试。