Evaluation Dataset
The following datasets are all transformed into standard Evaluation Prompts before evaluation.
Dataset 1(EPRSTMT)
Data description:
EPRSTMT is a dataset that conducts sentiment analysis on reviews of e-commerce products( E-commerce Product Review Dataset for Sentiment Analysis)
Dataset structure:
Amount of source data:
The dataset is split into train(32), validation(32), public test(610), test(753), unsupervised (19565)
Data detail:
KEYS | EXPLAIN |
---|---|
id | id of the data in json file |
sentence | sentence |
label | label, 'Positive' means positive and 'Negative' means negative |
Sample of source dataset:
{"id": 23,
"sentence": "外包装上有点磨损,试听后感觉不错",
"label": "Positive"}
Citation information:
{FewCLUE,
title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
year={2021},
howpublished={\url{https://arxiv.org/abs/2107.07498}},
}
Dataset 2(TNEWS)
Data description:
TNEWS(Toutiao Short Text Classificaiton for News),comes from the news section of Toutiao and extracts news in 15 categories, including tourism, education, finance, military, etc.
Dataset structure:
Amount of source data:
Sample from test of source dataset(618)
Data detail:
KEYS | EXPLAIN |
---|---|
label | classification ID |
label_des | classification name |
setence | news string (title only) |
Sample of source dataset:
{"label": "102",
"label_des": "news_entertainment",
"sentence": "江疏影甜甜圈自拍,迷之角度竟这么好看,美吸引一切事物"}
Citation information:
{FewCLUE,
title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
year={2021},
howpublished={\url{https://arxiv.org/abs/2107.07498}},
}
Dataset 3(OCNLI)
Data description:
OCNLI, or Native Chinese Natural Language Inference dataset, is the first large non-translated Chinese natural language inference dataset using native Chinese. OCNLI contains more than 50,000 training data, 3,000 validation data, and 3,000 test data. In addition to test data, we will provide data and labels. Test data only provides data. OCNLI is part of the Chinese language Understanding benchmark(CLUE).
Dataset structure:
Amount of source data:
The dataset is split into train(32), validation(32), public test(2520), test(3000), unsupervised (20000)
Data detail:
KEYS | EXPLAIN |
---|---|
level | [Difficulty] : 'easy', 'medium', and 'hard' respectively represent the first, second, and third hypothesis written by the tagger for a label (such as entailment) |
sentence1 | [Sentence 1], the premise |
sentence2 | [Sentence 2], the assumption |
label | [label], the majority vote for label 0 - label 4. If labeled '-', this data should be removed |
label0 -- label4 | [5 labels], there are 5 labels for the data of both verification set and test set. The training set has five labels for only part of the data |
genre | [Text Category], a total of 5 categories: government bulletin, news, literature, TV talk show, telephone translation |
prem_id | [Prerequisite number] |
id | [General number] |
Sample of source dataset:
{
"level":"medium",
"sentence1":"身上裹一件工厂发的棉大衣,手插在袖筒里",
"sentence2":"身上至少一件衣服",
"label":"entailment",
"label0":"entailment","label1":"entailment","label2":"entailment","label3":"entailment","label4":"entailment",
"genre":"lit",
"prem_id":"lit_635",
"id":0
}
Citation information:
@inproceedings{ocnli,
title={OCNLI: Original Chinese Natural Language Inference},
author={Hai Hu and Kyle Richardson and Liang Xu and Lu Li and Sandra Kuebler and Larry Moss},
booktitle={Findings of EMNLP},
year={2020},
url={https://arxiv.org/abs/2010.05444}
}
Licensing information:
•Signature - Non-Commercial 2.0 Universal (CC BY-NC 2.0) •News type premises were sampled from the LCMC corpus (ISLRN ID: 990-638-120-222-2, ELRA reference: Elra-W0039) with the permission of ELRA.
Dataset 4(BUSTM)
Data description:
Conversational short text semantic matching data set, derived from the small cloth assistant. It is a voice assistant developed by OPPO for branded mobile phones and IoT devices, providing users with convenient conversational services. Intention recognition is a core task in dialogue system, and semantic matching of short text is one of the main algorithms of intention recognition. Ask to predict whether they belong to the same semantics based on the short text query-pair.
Dataset structure:
Amount of source data:
The dataset is split into train(32), validation(32), public test(1772), test(2000), unsupervised (4251)
Data:
KEYS | EXPLAIN |
---|---|
id | data id |
sentence1 | sentence1 |
sentence2 | sentence2 |
label | True or false labels, "1" means two sentences belong to the same semantic, "0" means not |
Sample of source dataset:
{"id": 5,
"sentence1": "女孩子到底是不是你",
"sentence2": "你不是女孩子吗",
"label": "1"}
Citation information:
{FewCLUE,
title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
year={2021},
howpublished={\url{https://arxiv.org/abs/2107.07498}},
}