Evaluation Metrics
1. Accuracy
Accuracy refers to the average correctness of the model across all evaluation instances. The concept of correctness may vary in different contexts, so we enumerate the main accuracy metrics considered in the evaluation work, the application scenarios of these metrics, and their formal definitions. The accuracy metrics include the following:
Metrics | Description |
---|---|
Slot_type_f1 | Assesses the accuracy of slot type prediction in natural language understanding tasks. |
Slot_value_cer | Calculates the character error rate (CER) between predicted and ground truth slot values. |
Slot_value_wer | Computes the word error rate (WER) between predicted and ground truth slot values. |
Slot_edit_f1_full | Evaluates the slot filling performance considering all slots, using the slot edit F1 score. |
Slot_edit_f1_part | Measures the slot filling performance focusing only on specific slots, using the slot edit F1 score. |
WER | Computes the word error rate (WER) between predicted and ground truth sequences. |
CER | Calculates the character error rate (CER) between predicted and ground truth sequences. |
2. Robustness
The designed dataset contains certain errors and noise, such as repetitions, hesitations, corrections, meaningless syllables, and environmental noise, to measure the model's 1. Accuracy on such data (approximately 10%).