Evaluation Dataset

The following datasets are all transformed into standard Evaluation Prompts before evaluation.

Dataset1（CLCC）

Chinese Linguistics & Cognition Challenge (CLCC) mainly consists of two parts:

CLCC-H (190): Sampling from open source data set and manual screening (manual evaluation required)
CLCC-H-v2.0：CLCC-H-v2.0 screened and modified the data on CLCC-H-v1.0, while adding a large number of questions formulated manually. The evaluation results were all generated by manual evaluation.

KEYS	EXPLAIN
ID	question id
question	question
answer	reference answer
source	data source
ability	corresponding ability label