Skip to content

Evaluation Dataset

The following datasets are all transformed into standard Evaluation Prompts before evaluation.

Dataset1(CLCC)

#Metrics-Accuracy

Data description:

Chinese Linguistics & Cognition Challenge (CLCC) mainly consists of two parts:

  1. CLCC-H (190): Sampling from open source data set and manual screening (manual evaluation required)

  2. CLCC-H-v2.0:CLCC-H-v2.0 screened and modified the data on CLCC-H-v1.0, while adding a large number of questions formulated manually. The evaluation results were all generated by manual evaluation.

Dataset structure:

Data detail:
KEYSEXPLAIN
IDquestion id
questionquestion
answerreference answer
sourcedata source
abilitycorresponding ability label