Skip to content

Evaluation Metrics

1. Accuracy

Accuracy refers to the average correctness of the model across all evaluation instances. The concept of correctness may vary in different contexts, so we enumerate the main accuracy metrics considered in the evaluation work, the application scenarios of these metrics, and their formal definitions. The accuracy metrics include the following:

MetricsDescription
Slot_type_f1Assesses the accuracy of slot type prediction in natural language understanding tasks.
Slot_value_cerCalculates the character error rate (CER) between predicted and ground truth slot values.
Slot_value_werComputes the word error rate (WER) between predicted and ground truth slot values.
Slot_edit_f1_fullEvaluates the slot filling performance considering all slots, using the slot edit F1 score.
Slot_edit_f1_partMeasures the slot filling performance focusing only on specific slots, using the slot edit F1 score.
WERComputes the word error rate (WER) between predicted and ground truth sequences.
CERCalculates the character error rate (CER) between predicted and ground truth sequences.

2. Robustness

The designed dataset contains certain errors and noise, such as repetitions, hesitations, corrections, meaningless syllables, and environmental noise, to measure the model's 1. Accuracy on such data (approximately 10%).