Skip to content

Speech Separation Evaluation Data

LibriMix

#si-sdr

Adapting Method

For the output features of the upstream model, we applied a three-layer bidirectional LSTM (SepRNN). The model generates two mask matrices through two independent linear layers, which are then multiplied with the original audio features to obtain two separated speaker audio features.

Data Description

LibriMix is an open-source dataset designed for speech source separation of different speakers in noisy environments. This dataset is based on the clean subsets of LibriSpeech signals (including train-clean-100 and train-clean-360) along with WHAM noise. Various versions of LibriMix have been generated through mixing scripts, including two-speaker mixes, three-speaker mixes, and noisy mixes.

Composition and Specification of the Dataset

Source Data Volume

We selected the noise-free subset Libri2Mix with overlapping speech from two speakers in LibriMix, with the following data volumes:

There are two training data options available: train-360 (a speech separation dataset mixed from LibriSpeech's train-clean-360) and train-100 (mixed from LibriSpeech's train-clean-100):

  • train-360:212h
  • train-100:58h

Additionally, there is one validation set and one test set, each with a duration of:

  • dev:11h
  • test:11h

Training and Evaluation Data Volume

Training set: train-100, 58h, 13,900 samples;

Validation set: dev, 11h, 3,000 samples;

Test set: test, 11h, 3,000 samples;

Data Fields

The training set, validation set, and test set all contain mixture_{dataset_name}_mix_clean.csv:

  • mixture_{dataset_name}_mix_clean.csv: mixture_ID,mixture_path,source_1_path,source_2_path,length

Dataset Example

Below is an example of a data entry from mixture_train-100_mix_clean.csv:

32911-12359-0018_1723-141149-0013,

/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/mix_clean/2911-12359-0018_1723-141149-0013.wav,

/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/s1/2911-12359-0018_1723-141149-0013.wav,

/media/hlt/chenyang_space/chenyang_space/speech_editing_and_tts/projects/s3prl/LibriMix/storage_dir/Libri2Mix/wav16k/min/train-100/s2/2911-12359-0018_1723-141149-0013.wav,

221120

Evaluation Metric

  • si-sdr

Paper Citation

@article{Cosentino_Pariente_Cornell_Deleforge_Vincent_2020,   
title={LibriMix: An open-source dataset for generalizable speech separation},  journal={Le Centre pour la Communication Scientifique Directe - HAL - memSIC,Le Centre pour la Communication Scientifique Directe - HAL - memSIC},  
author={Cosentino, Joris and Pariente, Manuel and Cornell, Samuele and Deleforge, Antoine and Vincent, Emmanuel},  
year={2020},  
month={May},  
language={en-US} 
}

The LibriSpeech dataset is licensed under CC BY 4.0. The scripts to generate LibriMix are licensed under the MIT License.