Evaluation Metrics

1. Accuracy

Accuracy refers to the average correctness of the model across all evaluation instances. The concept of correctness may vary in different contexts, so we enumerate the main accuracy metrics considered in the evaluation work, the application scenarios of these metrics, and their formal definitions. The accuracy metrics include the following:

Metrics	Description
Slot_type_f1	Assesses the accuracy of slot type prediction in natural language understanding tasks.
Slot_value_cer	Calculates the character error rate (CER) between predicted and ground truth slot values.
Slot_value_wer	Computes the word error rate (WER) between predicted and ground truth slot values.
Slot_edit_f1_full	Evaluates the slot filling performance considering all slots, using the slot edit F1 score.
Slot_edit_f1_part	Measures the slot filling performance focusing only on specific slots, using the slot edit F1 score.
WER	Computes the word error rate (WER) between predicted and ground truth sequences.
CER	Calculates the character error rate (CER) between predicted and ground truth sequences.

2. Robustness

The designed dataset contains certain errors and noise, such as repetitions, hesitations, corrections, meaningless syllables, and environmental noise, to measure the model's 1. Accuracy on such data (approximately 10%).

Evaluation Metrics ​

1. Accuracy ​

2. Robustness ​

Evaluation Metrics

1. Accuracy

2. Robustness