Evaluation Dataset
Voicebank-DEMAND
# PESQ,STOI
Adaptation Method
We adopt an adaptation method similar to that in speech separation tasks, where a three-layer Bi-LSTM (SepRNN) is applied to the output features of the upstream model to generate a predicted spectral mask of the clean signal.
Data description
Voicebank-DEMAND is a synthetic dataset created by mixing up clean speech and noise.The clean speech is extracted from the Voice Bank corpus, and the noise is from the Diverse Environments Multichannel Acoustic Noise Database (DEMAND). The training set contains 28 speakers with 4 signal-to-noise ratios (SNRs) (15, 10, 5, and 0 dB) and the test set contains 2 speakers with 4 SNRs (17.5, 12.5, 7.5,and 2.5 dB). The training set contains 11,572 utterances (9.4h) and the test set contains 824 utterances (0.6h). The lengths of utterances range from 1.1s to 15.1s with an average of 2.9s.
Dataset structure
Amount of source data
Training set 8.8h, validation set 0.6h, test set 0.6h
Data detail
The training set, validation set, and test set all contain three files: spk2utt, utt2spk, and wav.scp:
- wav.scp:wav_id wav_path
- spk2utt:wav_id wav_id
- utt2spk:wav_id wav_id
Sample of source dataset
wav.scp:
p226_001 /home/datasets/noisy-vctk-16k/noisy_trainset_28spk_wav_16k/p226_001.wav
spk2utt:
p226_001 p226_001
utt2spk:
p226_001 p226_001Citation information
@inproceedings{ValentiniBotinhao2017NoisySD,
title={Noisy speech database for training speech enhancement algorithms and TTS models},
author={Cassia Valentini-Botinhao},
year={2017},
url={https://api.semanticscholar.org/CorpusID:64530884}
}Licensing information
CC BY 4.0 Licensed