FlagEval

Introduction of Robustness

Robustness refers to the ability of a model to maintain stability and efficiency in the face of different types of anomalies, noise, interference, changes, or malicious attacks. In abstract, the current basic model (including learning-based deep learning models) gives a data input $X$ , and the parametric model $F_{θ} (\cdot)$ goes through its defined calculations to obtain the expected output $Y$ of the model. Robustness can usually be understood as whether the model can give the correct output in the presence of noise. Specifically, given the disturbance noise $Δ X$ , whether the model's output $F_{θ} (X)$ is equal to the expected output $Y$ , we quantify the difference as $D e l t a Y$ . In addition, the disturbance noise of the construction requires that it does not affect the human understanding of $X$ . Therefore, when constructing text noise, the test sample generated by the evaluation will design $Δ X$ , so that $X + Δ X$ and the original $X$ are not much different in human understanding, but it is easy to make mistakes in the output of the model.

We evaluate the robustness of the model by perturbing the instances. Specifically, we perturb the data set to varying degrees, which mainly includes two levels. One is common mistakes made by humans in the real world, which is divided into three levels: character level, single level, and sentence level. The character level includes the replacement of similar characters, the replacement of adjacent characters on the keyboard, the word level is the replacement of synonyms of words and the replacement of words in the semantic space of the agent model, and the sentence level is mainly the back translation of language. The other is targeted perturbation, such as using agent models to conduct adversarial attacks. After performing the above perturbation, we generated different perturbation data sets for different original data sets, and calculated the model's robustness index on the data set by evaluating the evaluation results of the model on the perturbation data set.

Datasets

IMDB

The construction of robust data sets has two aspects: one is to evaluate the disturbance results without using proxy model, which is divided into character level and word level; the other is to use adversarial perturbation algorithm to perturb the results through the proxy model, with the goal of disturbing the results of the proxy model in the maximum possible number of specified times.

The name of the disturbed datasets are as follows：

disturbance dataset name	disturbance methods
C-keyboard	disturbance-char-keyboard
C-ocr	disturbance-char-ocr
W-synonym	disturbance-word-synonym
W-wordembedding	disturbance-word-word-embedding
S-backtranslation	disturbance-sentence-back-translation

C、W、S、Adv , short of Char、Word、Sentence、adversarial

char level

Pick 1 to 3 words at random, and choose 1 to 2 characters for each word to replace, and perturb as follows

ocr（o—>0）

ocr imdb perturbed dataset example

{
	"label":0,
	"text":"I found this movie keally hard to sit through, my attention kept wandering off the tv. As far as romantic movies go. . this one is the worst I ' ve seen. D0n ' t bother with it."
}

keyboard（f—>E）

keyboard imdb perturbed dataset example

{
	"label":0,
	"text":"I Eound this movie really hSrd to sit through, my attentioh kDpt wandering off the tv. As far as romantic movies go. . this one is the worst I ' ve seen. Don ' t boRher with it."
}

word level

choose 3 to t words at random to replace, perturbing as follows

word_embedding（The glove6B-100d model was used to replace selected words with semantically similar words）

word_embedding imdb perturbed dataset example

{
	"label":0,
	"text":"I found this cast really hard to sit through, my questions kept strangers off the tv. As far as romantic movies come. . this one important the worst I ' ve seen. Don ' t bother with it."
}

synonym（Replace selected words with synonyms）

synonym imdb perturbed dataset example

{
	"label":0,
	"text":"I found this movie really hard to sit through, my attention kept wandering away the tv. Every bit far as romantic movies go. . this one is the worst I ' ve seen. Father ' t fuss with it."
}

Fight against disturbance
The adversarial disturbance algorithm is textfooler and the proxy model is lannelin/bert-imdb-1hidden. The attack results are as follows:
```
+-------------------------------+--------+
| Attack Results                |        |
+-------------------------------+--------+
| Number of successful attacks: | 21621  |
| Number of failed attacks:     | 162    |
| Number of skipped attacks:    | 3217   |
| Original accuracy:            | 87.13% |
| Accuracy under attack:        | 0.65%  |
| Attack success rate:          | 99.26% |
| Average perturbed word %:     | 3.45%  |
| Average num. words per input: | 230.46 |
| Avg num queries:              | 341.95 |
+-------------------------------+--------+
```
There are 25000 data in the test set, and the classification accuracy of the proxy model is 87.13% (Original accuracy: 87.13%), that is, 21783 samples are classified correctly and 3217 samples are classified incorrectly. Classification errors do not need to be disturbed, and the attack algorithm can skip directly when it encounters the sample of classification errors (Number of skipped attacks:3217). For 21783 pieces of data correctly classified by the proxy model, perturbation of word level was used to perturb 'text', and the Number of successful attacks was determined (21621) when the agent model misclassified the disturbed text. If the proxy model is still classified correctly after a specified Number of times, the Number of failed attacks is judged to be 162. (Average num. words per input:230.46). (Average num. Words per input:230.46). (Average num. Words per input:230.46). The Average perturbed word % : 3.45%, and the average perturbed word 341.95 (Avg num queries:341.95). The proxy model has an Accuracy under attack:0.65%.

Robustness Metrics（RB-index）

For the original data set and different perturbation data sets we have $A c c_{o r g}$ ， $A c c_{d i s t 1}$ ， $A c c_{d i s t 2}$ ， $A c c_{d i s t 3}$ ， $. . .$ ， $A c c_{d i s t T}$ ( $A c c$ refers to the evaluation index of the model under this data set, $o r g$ refers to the original data set, and $d i s 1. . . T$ refers to different perturbation data sets).

The calculation formula of the robustness index on this data set is:

$R o b u s t n e s s = \frac{1}{T * A c c_{o r g}} Σ_{i = 1}^{T} (A c c_{o r g} - A c c_{d i s t i})$

Smaller values of the robustness metric indicate better model robustness and can be negative (mostly found in NLP)

Introduction of Robustness ​

Datasets ​

IMDB ​

Robustness Metrics（RB-index） ​

Introduction of Robustness

Datasets

IMDB

Robustness Metrics（RB-index）