Basic Concepts

Below are some common-used terminology explanations for the FlagEval evaluation platform, which can help users understand some concepts of the platform:

Evaluation Domain

Each model to be evaluated is generally specific to a certain domain, which means that generally each model can only achive the functions of the specific domain. Currently, the supported domains on the platform are: NLP (natural language processing), CV (image), audio, and multimodal.

Evaluation Task

A model can perform multiple tasks in a domain. For example, the scenarios under the NLP domain can be divided into English Choice Q&A, Chinese Choice Q&A, English Classification, Chinese Classification, Chinese Open Q&A, etc.

Evaluation Object

Foundation Model: foundation models are models obtained through pre-training on huge unlabeled datasets. After a small amount of data fine-tuning, it can be used for different downstream tasks.
Pre-training algorithm: pre-training algorithm refers to the technique of training a new model from scratch on huge unlabeled datasets. The models trained by pre-training algorithms can capture general patterns and features in data.
Fine-tuning/Compression algorithm: Fine-tuning algorithm refers to the technology of peforming transfer learning and parameter adjusting of models to adapt to new tasks; compression algorithm refers to the technology of reducing model size and improving operational efficiency, including quantization, pruning, and other methods.

Dataset

Multiple datasets can be run under a single evaluation task. For example, EPRSTMT, TNEWS, OCNLI, BUSTM need to be evaluated under the Chinese Classification task. For each task's datasets, please click on the task name on the [homepage]. More datasets are being integrated.

Data Instance

Each dataset is composed of a series of data instances.

Card types and AI frameworks supported by the platform

Currently, the platform supports the following card types and AI frameworks:

Card Type	Brand	Framework	Base Image
NVIDIA_A100-SXM4-40GB	NVIDIA	PyTorch	ngc-pytorch-2303:flageval-refine
NVIDIA_A800-SXM4-80GB	NVIDIA	PyTorch	ngc-pytorch-2303:flageval-refine
NVIDIA_V100-PCIE-32GB	NVIDIA	PyTorch	ngc-pytorch-2303:flageval-refine
NVIDIA_T4	NVIDIA	PyTorch	ngc-pytorch-2303:flageval-refine
CAMBRICON_MLU370-X8	Cambricon	PyTorch	cambricon-pytorch-v1-6-0-torch1-9-ubuntu2004:flageval
KUNLUN_R300	KUNLUNXIN	PyTorch	kulun-xtcl-ubuntu2004:flageval
ASCEND 910A	Ascend	MindSpore	--

Image

A special file system that contains programs, libraries, resources, configurations, etc. required for when container is operating, which is equivalent to encapsulating the software operation environment. In addition to hardware computing resources, evaluation tasks also require Docker image as the operation environment.
Dockerfile: A text file used to build images, containing instructions, a list of dependencies, and some important descriptions required for building images.

Basic Concepts ​

Evaluation Domain ​

Evaluation Task ​

Evaluation Object ​

Dataset ​

Data Instance ​

Card types and AI frameworks supported by the platform ​

Image ​

Basic Concepts

Evaluation Domain

Evaluation Task

Evaluation Object

Dataset

Data Instance

Card types and AI frameworks supported by the platform

Image