Evaluation Dataset
The following datasets are all transformed into standard Evaluation Prompts before evaluation.
Dataset 1(IMDB)
Data description:
IMDB is a binary sentiment categorised large film review dataset containing much more data than previous benchmark datasets. 25,000 distinct film reviews were used as a training set and 25,000 were used as a test. There are other unlabeled data available.
Dataset structure:
Size of downloaded dataset files: 84.13 MB
Size of the generated dataset: 133.23 MB
Total amount of disk used: 217.35 MB
Amount of source data:
The dataset is split into train(25000), test(25000), unsupervised(50000)
Data field:
KEY | EXPLAIN |
---|---|
label | a classification label, with possible values including neg(0), pos (1) |
text | a string feature |
Sample of source dataset:
{
"label": 0,
"text": "Goodbye world2\n"
}
Citation information:
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
Dataset 2(RAFT)
Data description:
The Real-world Annotated Few-shot Tasks (RAFT) dataset is an aggregation of English-language datasets found in the real world. Associated with each dataset is a binary or multiclass classification task, intended to improve our understanding of how language models perform on tasks that have concrete, real-world value. Only 50 labeled examples are provided in each dataset.
Dataset structure:
sub datasets | Amount of source data | Amount of sampled data | Sample of source dataset |
---|---|---|---|
Ade Corpus V2 | train(50) test(5000) | Sample from test of source dataset(40) | Sentence: No regional side effects were noted. ID: 0 Label: 2 |
Banking 77 | train(50) test(5000) | Sample from test of source dataset(40) | Query: Is it possible for me to change my PIN number? ID: 0 Label: 23 |
NeurIPS Impact Statement Risks | train(50) test(150) | Sample from test of source dataset(40) | Paper title: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation... Paper link: https://proceedings.neurips.cc/paper/2020/file/ec1f764517b7ffb52057af6df18142b7-Paper.pdf... Impact statement: This work makes the first attempt to search for all key components of panoptic pipeline and manages to accomplish this via the p... ID: 0 Label: 1 |
One Stop English | train(50) test(516) | Sample from test of source dataset(40) | Article: For 85 years, it was just a grey blob on classroom maps of the solar system. But, on 15 July, Pluto was seen in high resolution ... ID: 0 Label: 3 |
Overruling | train(50) test(2350) | Sample from test of source dataset(40) | Sentence: in light of both our holding today and previous rulings in johnson, dueser, and gronroos, we now explicitly overrule dupree.... ID: 0 Label: 2 |
Semiconductor Org Types | train(50) test(449) | Sample from test of source dataset(40) | Paper title: 3Gb/s AC-coupled chip-to-chip communication using a low-swing pulse receiver... Organization name: North Carolina State Univ.,Raleigh,NC,USA ID: 0 Label: 3 |
Systematic Review Inclusion | train(50) test(2243) | Sample from test of source dataset(40) | Title: Prototyping and transforming facial textures for perception research... Abstract: Wavelet based methods for prototyping facial textures for artificially transforming the age of facial images were described. Pro... Authors: Tiddeman, B.; Burt, M.; Perrett, D. Journal: IEEE Comput Graphics Appl ID: 0 Label: 2 |
TAI Safety Research | train(50) test(1639) | Sample from test of source dataset(40) | Title: Malign generalization without internal search Abstract Note: In my last post, I challenged the idea that inner alignment failures should be explained by appealing to agents which perform ex... Url: https://www.alignmentforum.org/posts/ynt9TD6PrYw6iT49m/malign-generalization-without-internal-search... Publication Year: 2020 Item Type: blogPost Author: Barnett, Matthew Publication Title: AI Alignment Forum ID: 0 Label: 1 |
Terms Of Service | train(50) test(5000) | Sample from test of source dataset(40) | Sentence: Crowdtangle may change these terms of service, as described above, notwithstanding any provision to the contrary in any agreemen... ID: 0 Label: 2 |
Licensing information:
Dataset | License |
---|---|
Ade Corpus V2 | Unlicensed |
Banking 77 | CC BY 4.0 |
NeurIPS Impact Statement Risks | MIT License/CC BY 4.0 |
One Stop English | CC BY-SA 4.0 |
Overruling | Unlicensed |
Semiconductor Org Types | CC BY-NC 4.0 |
Systematic Review Inclusion | CC BY 4.0 |
TAI Safety Research | CC BY-SA 4.0 |
Terms Of Service | Unlicensed |
Tweet Eval Hate | Unlicensed |
Twitter Complaints | Unlicensed |