Skip to content

Evaluation Dataset

The following datasets are all transformed into standard Evaluation Prompts before evaluation.

Dataset 1(EPRSTMT)

#Metrics-Exact match

Data description:

EPRSTMT is a dataset that conducts sentiment analysis on reviews of e-commerce products( E-commerce Product Review Dataset for Sentiment Analysis)

Dataset structure:

Amount of source data:

The dataset is split into train(32), validation(32), public test(610), test(753), unsupervised (19565)

Data detail:

KEYSEXPLAIN
idid of the data in json file
sentencesentence
labellabel, 'Positive' means positive and 'Negative' means negative

Sample of source dataset:

{"id": 23,

"sentence": "外包装上有点磨损,试听后感觉不错", 

"label": "Positive"}

Citation information:

{FewCLUE,
  title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
  author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
  year={2021},
  howpublished={\url{https://arxiv.org/abs/2107.07498}},
}

Dataset 2(TNEWS)

#Metrics-Exact match

Data description:

TNEWS(Toutiao Short Text Classificaiton for News),comes from the news section of Toutiao and extracts news in 15 categories, including tourism, education, finance, military, etc.

Dataset structure:

Amount of source data:

Sample from test of source dataset(618)

Data detail:

KEYSEXPLAIN
labelclassification ID
label_desclassification name
setencenews string (title only)

Sample of source dataset:

{"label": "102", 

"label_des": "news_entertainment", 

"sentence": "江疏影甜甜圈自拍,迷之角度竟这么好看,美吸引一切事物"}

Citation information:

{FewCLUE,
  title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
  author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
  year={2021},
  howpublished={\url{https://arxiv.org/abs/2107.07498}},
}

Dataset 3(OCNLI)

#Metrics-Exact match

Data description:

OCNLI, or Native Chinese Natural Language Inference dataset, is the first large non-translated Chinese natural language inference dataset using native Chinese. OCNLI contains more than 50,000 training data, 3,000 validation data, and 3,000 test data. In addition to test data, we will provide data and labels. Test data only provides data. OCNLI is part of the Chinese language Understanding benchmark(CLUE).

Dataset structure:

Amount of source data:

The dataset is split into train(32), validation(32), public test(2520), test(3000), unsupervised (20000)

Data detail:

KEYSEXPLAIN
level[Difficulty] : 'easy', 'medium', and 'hard' respectively represent the first, second, and third hypothesis written by the tagger for a label (such as entailment)
sentence1[Sentence 1], the premise
sentence2[Sentence 2], the assumption
label[label], the majority vote for label 0 - label 4. If labeled '-', this data should be removed
label0 -- label4[5 labels], there are 5 labels for the data of both verification set and test set. The training set has five labels for only part of the data
genre[Text Category], a total of 5 categories: government bulletin, news, literature, TV talk show, telephone translation
prem_id[Prerequisite number]
id[General number]

Sample of source dataset:

{
"level":"medium",
"sentence1":"身上裹一件工厂发的棉大衣,手插在袖筒里",
"sentence2":"身上至少一件衣服",
"label":"entailment",

"label0":"entailment","label1":"entailment","label2":"entailment","label3":"entailment","label4":"entailment",
"genre":"lit",

"prem_id":"lit_635",

"id":0
}

Citation information:

@inproceedings{ocnli,
	title={OCNLI: Original Chinese Natural Language Inference},
	author={Hai Hu and Kyle Richardson and Liang Xu and Lu Li and Sandra Kuebler and Larry Moss},
	booktitle={Findings of EMNLP},
	year={2020},
	url={https://arxiv.org/abs/2010.05444}
}

Licensing information:

•Signature - Non-Commercial 2.0 Universal (CC BY-NC 2.0) •News type premises were sampled from the LCMC corpus (ISLRN ID: 990-638-120-222-2, ELRA reference: Elra-W0039) with the permission of ELRA.

Dataset 4(BUSTM)

#Metrics-Exact match

Data description:

Conversational short text semantic matching data set, derived from the small cloth assistant. It is a voice assistant developed by OPPO for branded mobile phones and IoT devices, providing users with convenient conversational services. Intention recognition is a core task in dialogue system, and semantic matching of short text is one of the main algorithms of intention recognition. Ask to predict whether they belong to the same semantics based on the short text query-pair.

Dataset structure:

Amount of source data:

The dataset is split into train(32), validation(32), public test(1772), test(2000), unsupervised (4251)

Data:

KEYSEXPLAIN
iddata id
sentence1sentence1
sentence2sentence2
labelTrue or false labels, "1" means two sentences belong to the same semantic, "0" means not

Sample of source dataset:

{"id": 5,

 "sentence1": "女孩子到底是不是你",

 "sentence2": "你不是女孩子吗",

 "label": "1"}

Citation information:

{FewCLUE,
  title={FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark},
  author={Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai},
  year={2021},
  howpublished={\url{https://arxiv.org/abs/2107.07498}},
}