Skip to content

Quick Start

Register and Login

Unregistered users are by default allowed to view the platform’s homepage, news, leaderboard, and user manual, as well as experience the Large Model Arena and Debate Competition features.

To access the evaluation functions, users must register and log in to the platform, apply for participation in the evaluation through the Evaluation Management section, and complete their personal information. Please ensure that all submitted information is accurate and valid. Once the information is submitted and approved by the administrator, users will be granted access to the evaluation features of the platform.

Detailed instructions are as follows:

Register

When users click the [Login/Register] button, the following interface will appear. For first-time users, please scan the QR code using WeChat to follow the “BAAI Community Assistant” official WeChat account.

sign-1 After scanning the QR code and following the account, the interface will update to the following layout, where users can register by entering their email, phone number, and verification code online. sign-2

After completing registration, the system will redirect to the platform homepage. By clicking [Evaluation Console], users can apply for evaluation participation. Users are required to complete their personal information, which the platform administrator will review. Only users who pass the review will be granted access to the evaluation features. The review result will be sent to users via email.

completement

ParameterExplanation
Username
  • Username will be the only identifier on the platform. It is recommended to use the spelling of full name with numbers. After filling in the username, modification is not allowed.
  • Length of 3-32 characters, supports lowercase letters and numbers, starting with a lowercase letter.
Real Name
  • Users should fill in their real name, and the platform administrator will give priority to real names during the approval process.
Organization
  • It is recommended to use a combination of organization + department, such as Beijing Academy of Artifical Intelligence Computing Power Platform, Tsinghua University Computer Scence Department. The platform administrator will give priority to real organization during the approval process.
  • Organizations need to be filled in with both Chinese and English.
Task to Register
  • Select "Online Evaluation" or "Offline Evaluation":
  • Online Evaluation: Users only need to provide the evaluation interface API, and the evaluation platform provides test data for inference evaluation. - Not supported yet.
  • Offline Evaluation: Users need to upload trained models and inference codes. The evaluation platform provides inference computing power and data for inference evaluation.
Whether to evaluate self-developed models
  • Yes & No, single choice
Agreement statement
  • Users need to read and provide consent to the agreement before they can use the platform's evaluation function.

The registration process is shown in the following image:

Note:

  • Please fill in your personal information carefully, as the administrator will review your application based on the provided details.
  • Please fill in a valid personal business email. The review status will be notified via email and SMS, and future evaluation task updates will be sent by email.
  • If a personal email is used, the administrator will send an email requesting you to update it. Please change to an email address and wait for the review again. Each user is allowed to modify their email only once per month.

Login

If the user has already completed registration, clicking the [Login/Register] button will bring up the Login page. The user can choose to log in by scanning the QR code at the top of the screen using WeChat Scan, or by using the Mobile Verification Code option.

Alternatively, the user can click [Hugging Face] to log in via a third-party platform, which will redirect to the Hugging Face login page.

The login process is as follows: 输入图片说明输入图片说明

Enter the following two pieces of information: Username / Email Address and Password.

输入图片说明

If you do not have a Hugging Face account, please register first. You can also log in by authorizing with your Hugging Face account. 输入图片说明输入图片说明

Create Evaluation

When users click [Evaluation Management], they will enter the Evaluation Management page, which mainly includes: Model Evaluation, Innovative Algorithm Evaluation, and Image Management.

Users can choose either Model Evaluation or Innovative Algorithm Evaluation based on their needs. By clicking [Create Evaluation], a Create Evaluation dialogue box will pop up. Users should fill in the corresponding form according to the evaluation domain to submit and generate an evaluation task.

After creating an evaluation, the system will automatically redirect to the task details page. Users can click to view the "Upload Model & Code" specification and use flageval-serving to upload models and codes. After uploading, click "Inference Verification" to quickly verify whether the inference evaluation code can run. After passing the verification, click "Start Inference Evaluation" to proceed with the formal inference evaluation process. Wait for the evaluation to end to view the evaluation results. If there is any problem that causes termination and failure, the error message can be viewed through logs.

Upload Image

In [Image Management], several preset images are provided. If users need to use a custom image during actual evaluations, they can upload their own image under [Custom Image].

When users click [Image Management / Custom Image / Import Image], the [Import Image] dialog box will pop up. Users need to fill in and submit the form. After submission, the platform administrator will review it. Once approved, the image will be automatically imported. Only after a successful import can the image be used in evaluation tasks.

Currently, the platform only supports importing existing images. It does not support building images on the platform using a Dockerfile. The Dockerfile provided by the user is for review purposes only.

输入图片说明