Skip to content

ASR Evaluation Data

Librispeech

# Word Error Rate (WER)

Dataset Description

Librispeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from audiobooks from the LibriVox project, carefully segmented and aligned.

Source Data Volume

Training data: 960h (train-clean-100, train-clean-360, train-other-500)

Two validation and test sets: dev-clean, test-clean; dev-other, test-other

Evaluation Data Volume

test-clean: 5.4h, 2620 utterances test-dev: 5.1h, 2939 utterances

Data Fields

The dataset is organized by audio IDs. For example, the audio file 19-198-0001.flac corresponds to the transcript in 19-198.trans.txt, with fields: wav_id text

Dataset Example

78-368-0000 CHAPTER TWENTY THREE IT WAS EIGHT O'CLOCK WHEN WE LANDED WE WALKED FOR A SHORT TIME ON THE SHORE ENJOYING THE TRANSITORY LIGHT AND THEN RETIRED TO THE INN

Evaluation Metric

Word Error Rate (WER)

Citation

@inproceedings{panayotov2015librispeech, title={Librispeech: an asr corpus based on public domain audio books}, author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev}, booktitle={2015 IEEE international conference on acoustics, speech and signal processing (ICASSP)}, pages={5206--5210}, year={2015}, organization={IEEE} }

License

CC BY 4.0


AISHELL-1

# Character Error Rate (CER)

Dataset Description

AISHELL-1 is a 178-hour open-source Mandarin speech dataset released by Beijing Shell Shell. It is one of the most widely used Mandarin speech corpora. A total of 400 speakers participated in recording. The dataset was recorded in a quiet indoor environment with high-fidelity microphones, downsampled to 16kHz. The transcription accuracy exceeds 95%.

Dataset Composition and Specifications

Source Data Volume

Training set: 150h Validation set: 10h Test set: 5h

Evaluation Data Volume

Public test set: 5h, 7176 utterances

Data Fields

Each of the training, validation, and test sets contains two files: wav.scp and text.

  • wav.scp: wav_id wav_path
  • text: wav_id text

Dataset Example

wav.scp:
BAC009S0002W0122 /mnt/sda/jiaming_space/datasets/aishell/data_aishell/wav/train/S0002/BAC009S0002W0122.wav

text:
BAC009S0002W0122 而对楼市成交抑制作用最大的限购

Evaluation Metric

Character Error Rate (CER)

Citation

@inproceedings{bu2017aishell,
  title={Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline},
  author={Bu, Hui and Du, Jiayu and Na, Xingyu and Wu, Bengu and Zheng, Hao},
  booktitle={2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA)},
  pages={1--5},
  year={2017},
  organization={IEEE}
}

License

Apache License v.2.0


ChildMandarin

# Character Error Rate (CER)

Dataset Description

ChildMandarin is an open-source dataset of young children (ages 3–5), jointly released by Nankai University and Beijing Academy of Artificial Intelligence (BAAI). It contains 397 children speakers from 22 provincial-level regions in China, totaling 41.25 hours.

Dataset Composition and Specifications

Source Data Volume

Training set: 33.35h Validation set: 3.78h Test set: 4.12h

Evaluation Data Volume

Public test set: 4.12h, 4198 utterances

Data Fields

Each of the training, validation, and test sets contains two files: wav.scp and text.

  • wav.scp: wav_id wav_path
  • text: wav_id text

Dataset Example

text:
./data/148/148_5_F_L_ZIBO_Android_021.pcm 小鱼跳出水面没地方游泳了。
./data/148/148_5_F_L_ZIBO_Android_071.pcm 我很乖,我没有哭。
./data/148/148_5_F_L_ZIBO_Android_088.pcm 我喜欢画画想当画家。

Evaluation Metric

Character Error Rate (CER)

Citation

@inproceedings{zhou-etal-2025-childmandarin,
    title = "{C}hild{M}andarin: A Comprehensive {M}andarin Speech Dataset for Young Children Aged 3-5",
    author = "Zhou, Jiaming  and
      Wang, Shiyao  and
      Zhao, Shiwan  and
      He, Jiabei  and
      Sun, Haoqin  and
      Wang, Hui  and
      Liu, Cheng  and
      Kong, Aobo  and
      Guo, Yujie  and
      Yang, Xi  and
      Wang, Yequan  and
      Lin, Yonghua  and
      Qin, Yong",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.614/",
    doi = "10.18653/v1/2025.acl-long.614",
    pages = "12524--12537",
    ISBN = "979-8-89176-251-0",
}

License

CC BY-NC-SA 4.0


SeniorTalk

# Character Error Rate (CER)

Dataset Description

SeniorTalk is a comprehensive open-source Mandarin speech dataset specifically targeting elderly speakers aged 75–85. It was released by Nankai University to address the severe lack of publicly available resources for this demographic, aiming to advance research in Automatic Speech Recognition (ASR) and related fields.

Dataset Composition and Specifications

Source Data Volume

Training set: 29.95h Validation set: 4.09h Test set: 3.77h

Evaluation Data Volume

Public test set: 3.77h, 5869 utterances

Data Fields

Includes audio files, sentence transcriptions, and speaker annotations.

sentence_data/  
├── wav  
│   ├── train/*.tar
│   ├── dev/*.tar 
│   └── test/*.tar   
└── transcript/*.txt   
UTTERANCEINFO.txt  # annotation of topics and duration
SPKINFO.txt        # annotation of location, age, gender, and device

Dataset Example

Elderly0122S0001W0003.wav	找个有趣的地方玩一玩。
Elderly0122S0001W0005.wav	朝阳公园吧。
Elderly0122S0001W0010.wav	又遮阳光。
Elderly0122S0001W0016.wav	现在。
Elderly0122S0001W0023.wav	就是不太新鲜。
Elderly0122S0001W0026.wav	要新鲜一点的。
Elderly0122S0001W0027.wav	吃了才有营养。
Elderly0122S0001W0029.wav	好。

Evaluation Metric

Character Error Rate (CER)

Citation

@misc{chen2025seniortalkchineseconversationdataset,
      title={SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors}, 
      author={Yang Chen and Hui Wang and Shiyao Wang and Junyang Chen and Jiabei He and Jiaming Zhou and Xi Yang and Yequan Wang and Yonghua Lin and Yong Qin},
      year={2025},
      eprint={2503.16578},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.16578}, 
}

License

CC BY-NC-SA 4.0