Skip to content

Basic Concepts

Below are some common-used terminology explanations for the FlagEval evaluation platform, which can help users understand some concepts of the platform:

Evaluation Domain

  • Each model to be evaluated is generally specific to a certain domain, which means that generally each model can only achive the functions of the specific domain. Currently, the supported domains on the platform are: NLP (natural language processing), CV (image), audio, and multimodal.

Evaluation Task

  • A model can perform multiple tasks in a domain. For example, the scenarios under the NLP domain can be divided into English Choice Q&A, Chinese Choice Q&A, English Classification, Chinese Classification, Chinese Open Q&A, etc.

Evaluation Object

  • Foundation Model: foundation models are models obtained through pre-training on huge unlabeled datasets. After a small amount of data fine-tuning, it can be used for different downstream tasks.
  • Pre-training algorithm: pre-training algorithm refers to the technique of training a new model from scratch on huge unlabeled datasets. The models trained by pre-training algorithms can capture general patterns and features in data.
  • Fine-tuning/Compression algorithm: Fine-tuning algorithm refers to the technology of peforming transfer learning and parameter adjusting of models to adapt to new tasks; compression algorithm refers to the technology of reducing model size and improving operational efficiency, including quantization, pruning, and other methods.

Dataset

  • Multiple datasets can be run under a single evaluation task. For example, EPRSTMT, TNEWS, OCNLI, BUSTM need to be evaluated under the Chinese Classification task. For each task's datasets, please click on the task name on the [homepage]. More datasets are being integrated.

Data Instance

  • Each dataset is composed of a series of data instances.

Card types and AI frameworks supported by the platform

  • Currently, the platform supports the following card types and AI frameworks:
Card TypeBrandFrameworkBase Image
NVIDIA_A100-SXM4-40GBNVIDIAPyTorchngc-pytorch-2303:flageval-refine
NVIDIA_A800-SXM4-80GBNVIDIAPyTorchngc-pytorch-2303:flageval-refine
NVIDIA_V100-PCIE-32GBNVIDIAPyTorchngc-pytorch-2303:flageval-refine
NVIDIA_T4NVIDIAPyTorchngc-pytorch-2303:flageval-refine
CAMBRICON_MLU370-X8CambriconPyTorchcambricon-pytorch-v1-6-0-torch1-9-ubuntu2004:flageval
KUNLUN_R300KUNLUNXINPyTorchkulun-xtcl-ubuntu2004:flageval
ASCEND 910AAscendMindSpore--

Image

  • A special file system that contains programs, libraries, resources, configurations, etc. required for when container is operating, which is equivalent to encapsulating the software operation environment. In addition to hardware computing resources, evaluation tasks also require Docker image as the operation environment.
  • Dockerfile: A text file used to build images, containing instructions, a list of dependencies, and some important descriptions required for building images.