Speech Separation Evaluation Data
LibriMix
#si-sdr
Adapting Method
For the output features of the upstream model, we applied a three-layer bidirectional LSTM (SepRNN). The model generates two mask matrices through two independent linear layers, which are then multiplied with the original audio features to obtain two separated speaker audio features.
Data Description
LibriMix is an open-source dataset designed for speech source separation of different speakers in noisy environments. This dataset is based on the clean subsets of LibriSpeech signals (including train-clean-100 and train-clean-360) along with WHAM noise. Various versions of LibriMix have been generated through mixing scripts, including two-speaker mixes, three-speaker mixes, and noisy mixes.
Composition and Specification of the Dataset
Source Data Volume
We selected the noise-free subset Libri2Mix with overlapping speech from two speakers in LibriMix, with the following data volumes:
There are two training data options available: train-360 (a speech separation dataset mixed from LibriSpeech's train-clean-360) and train-100 (mixed from LibriSpeech's train-clean-100):
- train-360:212h
- train-100:58h
Additionally, there is one validation set and one test set, each with a duration of:
- dev:11h
- test:11h
Training and Evaluation Data Volume
Training set: train-100, 58h, 13,900 samples;
Validation set: dev, 11h, 3,000 samples;
Test set: test, 11h, 3,000 samples;
Data Fields
The training set, validation set, and test set all contain mixture_{dataset_name}_mix_clean.csv:
- mixture_{dataset_name}_mix_clean.csv: mixture_ID,mixture_path,source_1_path,source_2_path,length
Dataset Example
Below is an example of a data entry from mixture_train-100_mix_clean.csv:
32911-12359-0018_1723-141149-0013,
/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/mix_clean/2911-12359-0018_1723-141149-0013.wav,
/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/s1/2911-12359-0018_1723-141149-0013.wav,
/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/s2/2911-12359-0018_1723-141149-0013.wav,
221120
Evaluation Metric
- si-sdr
Paper Citation
@article{Cosentino_Pariente_Cornell_Deleforge_Vincent_2020, title={LibriMix: An open-source dataset for generalizable speech separation}, journal={Le Centre pour la Communication Scientifique Directe - HAL - memSIC,Le Centre pour la Communication Scientifique Directe - HAL - memSIC}, author={Cosentino, Joris and Pariente, Manuel and Cornell, Samuele and Deleforge, Antoine and Vincent, Emmanuel}, year={2020}, month={May}, language={en-US} }
Dataset Copyright Usage Information
The LibriSpeech dataset is licensed under CC BY 4.0. The scripts to generate LibriMix are licensed under the MIT License.