Evaluation Dataset
Voicebank-DEMAND
# PESQ,STOI
Data description:
Voicebank-DEMAND is a synthetic dataset created by mixing up clean speech and noise.The clean speech is extracted from the Voice Bank corpus, and the noise is from the Diverse Environments Multichannel Acoustic Noise Database (DEMAND). The training set contains 28 speakers with 4 signal-to-noise ratios (SNRs) (15, 10, 5, and 0 dB) and the test set contains 2 speakers with 4 SNRs (17.5, 12.5, 7.5,and 2.5 dB). The training set contains 11,572 utterances (9.4h) and the test set contains 824 utterances (0.6h). The lengths of utterances range from 1.1s to 15.1s with an average of 2.9s.
Dataset structure:
Amount of source data:
Training set 8.8h, validation set 0.6h, test set 0.6h
Data detail:
The training set, validation set, and test set all contain three files: spk2utt, utt2spk, and wav.scp:
- wav.scp:wav_id wav_path
- spk2utt:wav_id wav_id
- utt2spk:wav_id wav_id
Sample of source dataset:
wav.scp:
p226_001 /home/datasets/noisy-vctk-16k/noisy_trainset_28spk_wav_16k/p226_001.wav
spk2utt:
p226_001 p226_001
utt2spk:
p226_001 p226_001
Citation information:
@inproceedings{ValentiniBotinhao2017NoisySD,
title={Noisy speech database for training speech enhancement algorithms and TTS models},
author={Cassia Valentini-Botinhao},
year={2017},
url={https://api.semanticscholar.org/CorpusID:64530884}
}
Licensing information:
CC BY 4.0 Licensed