Evaluation Dataset
The following datasets are all transformed into standard Evaluation Prompts before evaluation.
Dataset1(CLCC)
Data description:
Chinese Linguistics & Cognition Challenge (CLCC) mainly consists of two parts:
CLCC-H (190): Sampling from open source data set and manual screening (manual evaluation required)
CLCC-H-v2.0:CLCC-H-v2.0 screened and modified the data on CLCC-H-v1.0, while adding a large number of questions formulated manually. The evaluation results were all generated by manual evaluation.
Dataset structure:
Data detail:
KEYS | EXPLAIN |
---|---|
ID | question id |
question | question |
answer | reference answer |
source | data source |
ability | corresponding ability label |