Skip to content

Evaluation Metrics

1. Accuracy

Accuracy refers to the average correctness of the model across all evaluation instances. The concept of correctness may vary in different contexts, so we enumerate the main accuracy metrics considered in the evaluation work, the application scenarios of these metrics, and their formal definitions.

1.1 accuracy

The ratio of correctly predicted classifications to the total number of predictions. In the MVBench evaluation, accuracy is calculated as the number of questions correctly answered by the model divided by the total number of questions. For multiple-choice questions, the model needs to select the correct option from candidate answers; for open-ended questions, semantic matching methods are used to determine if an answer is correct.