Skip to content

SF Evaluation Data

SNIPS

#Accuracy, Robustness

Adaptation Method:

CTC Decoder: Features output by the upstream model are passed through two LSTM layers and a linear classifier with a fully connected layer. The input dimension is equal to the feature vector dimension, and the output dimension is equal to the number of slot types.

Related paper citation:

@inproceedings{graves2006connectionist,
title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks},
author={Graves, Alex and Fern{'a}ndez, Santiago and Gomez, Faustino and Schmidhuber, J{"u}rgen},
booktitle={Proceedings of the 23rd international conference on Machine learning},
pages={369–376},
year={2006}
}

Data Description:

The SNIPS Natural Language Understanding benchmark is a dataset containing over 16,000 crowdsourced queries in English, distributed across 7 user intents of varying complexity: SearchCreativeWork (e.g., find me a robot TV show), GetWeather (e.g., is it windy in Boston, Massachusetts right now?), BookRestaurant (e.g., I want to book a highly rated restaurant in Paris for tomorrow night), PlayMusic (e.g., play the last track by Beyoncé on Spotify), AddToPlaylist (e.g., add Diamonds to my road trip playlist), RateBook (e.g., give six stars to Of Mice and Men), SearchScreeningEvent (e.g., check the screening time for Wonder Woman in Paris).

Dataset structure:

Amount of source data:

Training set: 13,084 items, Validation set: 700 items, Test set: 700 items.

Amount of Evaluation data:

The evaluation data volume is the public test set of 700 items.

Data detail:

KEYSEXPLAIN
idPath to the data's MP3 file
textText corresponding to the MP3 file
labelSlot type for each token

Sample of source dataset:

{
    "id":"Aditi-snips-test-0",
    "text":"BOS I'D LIKE TO HAVE THIS TRACK ONTO MY CLASSICAL RELAXATIONS PLAYLIST EOS"	
    "label":"O O O O O O music_item O playlist_owner playlist playlist O AddToPlaylist"
}

Citation information:

@article{coucke2018snips,
  title={Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces},
  author={Coucke, Alice and Saade, Alaa and Ball, Adrien and Bluche, Th{\'e}odore and Caulier, Alexandre and Leroy, >   David and Doumouro, Cl{\'e}ment and Gisselbrecht, Thibault and Caltagirone, Francesco and Lavril, Thibaut and others},
  journal={arXiv preprint arXiv:1805.10190},
  year={2018}
  }
  ```

Licensing information:

Creative Commons Zero v1.0 Universal