Skip to content

Evaluation Task and Evaluation Data Introduction

Natural Language Processing (NLP)

The main focus is on evaluating different types of capabilities of large language models. In addition to the self-constructed datasets, for some mainstream competence categories, we have also selected a number of public datasets that have not yet been saturated for evaluation: