FlagEval

Introduction of Robustness

Robustness refers to the ability of a model to maintain stability and efficiency in the face of different types of anomalies, noise, interference, changes, or malicious attacks. In abstract, the current basic model (including learning-based deep learning models) gives a data input $X$ , and the parametric model $F_{θ} (\cdot)$ goes through its defined calculations to obtain the expected output $Y$ of the model. Robustness can usually be understood as whether the model can give the correct output in the presence of noise. Specifically, given the disturbance noise $Δ X$ , whether the model's output $F_{θ} (X)$ is equal to the expected output $Y$ , we quantify the difference as $D e l t a Y$ . In addition, the disturbance noise of the construction requires that it does not affect the human understanding of $X$ . Therefore, when constructing text noise, the test sample generated by the evaluation will design $Δ X$ , so that $X + Δ X$ and the original $X$ are not much different in human understanding, but it is easy to make mistakes in the output of the model.

We evaluate the robustness of the model by perturbing the instances. Specifically, we perturb the data set to varying degrees, which mainly includes two levels. One is common mistakes made by humans in the real world, which is divided into three levels: character level, single level, and sentence level. The character level includes the replacement of similar characters, the replacement of adjacent characters on the keyboard, the word level is the replacement of synonyms of words and the replacement of words in the semantic space of the agent model, and the sentence level is mainly the back translation of language. The other is targeted perturbation, such as using agent models to conduct adversarial attacks. After performing the above perturbation, we generated different perturbation data sets for different original data sets, and calculated the model's robustness index on the data set by evaluating the evaluation results of the model on the perturbation data set.

Datasets

OCNLI

There are two aspects to the construction of robust data sets. One is to evaluate the perturbation results without using the surrogate model. The perturbations are divided into character level, sentence level and word level. The other is to use the adversarial perturbation algorithm to perturb the results through the surrogate model. , the goal is to disrupt the results of the agent model to the maximum extent possible within a specified number of times.

The name of the disturbed datasets are as follows：

disturbance dataset name	disturbance methods
C-morphonym	disturbance-char-morphonym
W-maskedlm	disturbance-word-masked-lm
S-backtranslation	disturbance-sentence-back-translation
Adv	adversarial

C、W、S、Adv , short of Char、Word、Sentence、adversarial

Character (char) level

Randomly select 3 to 15 characters for replacement. The perturbation method is as follows

1. Similar character transformation, perturbation examples are as follows

{
      "level": "hard", 
      "sentence1": "可是你好移是给别人打的呀.", 
      "sentence2": "你役哈我打过", 
      "label": "neutral", 
      "label0": "neutral", 
      "label1": "neutral", 
      "label2": "neutral", 
      "label3": "neutral", 
      "label4": "neutral", 
      "genre": "phone", 
      "prem_id": "phone_1894", 
      "id": 0
    }

1. Homophone character transformation, perturbation examples are as follows

{
      "level": "medium", 
      "sentence1": "怒厉促进祖国和平统一大业", 
      "sentence2": "祖国会采痛和平的举惜促进统一大业", 
      "label": "entailment", 
      "label0": "entailment", 
      "label1": "neutral", 
      "label2": "entailment", 
      "label3": "entailment", 
      "label4": "neutral", 
      "genre": "gov", 
      "prem_id": "gov_240", 
      "id": 2
    }

-Sentence level

Perform back-translation perturbation on the sentence (translate into English and then translate into Chinese). Examples of perturbation are as follows:

{
    "level": "easy", 
    "sentence1": "继续改革城市工人基本医疗保险制度、保健和毒品生产流通制度",
    "sentence2": "基本医疗保险制度尚未建立。", 
    "label": "contradiction", 
    "label0": "contradiction",
    "label1": "contradiction", 
    "label2": "contradiction", 
    "label3": "contradiction", 
    "label4": "contradiction", 
    "genre": "gov", 
    "prem_id": "gov_874", 
    "id": 3
  }

Word level

Use mask language modeling to replace words. Examples of perturbations are as follows:

{
    "level": "hard", 
    "sentence1": "随着我国汽车、建筑、家电、国防等制造业的崛起,冷轧油成为市场中的紧俏产品了", 
    "sentence2": "近年来我国的经济发展已经取得了长足的进步", 
    "label": "neutral", 
    "label0": "neutral", 
    "label1": "entailment", 
    "label2": "entailment", 
    "label3": "neutral", 
    "label4": "neutral", 
    "genre": "news", 
    "prem_id": "news_1726",
    "id": 4
  }

Fight against disturbance

After using the bert-base-chinese model (https://huggingface.co/bert-base-chinese) for training and fine-tuning on the OCNLI data set, it is used as a proxy model to conduct text attacks. The following are the attack results:

+----------------------------------+--------+
| Attack Results | |
+----------------------------------+--------+
| Number of successful attacks: | 1494 |
| Number of failed attacks: | 174 |
| Number of skipped attacks: | 852 |
| Original accuracy: | 66.18% |
| Accuracy under attack: | 6.91% |
| Attack success rate: | 89.57% |
| Average perturbed word %: | 20.83% |
| Average num. words per input: | 19.24 |
| Avg num queries: | 111.86 |
+----------------------------------+--------+

There are 2520 pieces of data in the test set, and the classification accuracy of the proxy model is 66.18% (Original accuracy: 66.18%), that is, 1668 samples are classified correctly and 852 samples are classified incorrectly. Classification errors do not require perturbation. When the attack algorithm encounters a misclassified example, it can be skipped directly (Number of skipped attacks: 852). For the 1668 pieces of data that were correctly classified by the proxy model, the word-level method was used for perturbation. The perturbation was stopped when the proxy model misclassified the perturbed text, and the attack was determined to be successful (Number of successful attacks: 1494). If the model still classifies correctly, the attack is determined to have failed (Number of failed attacks: 174). Among the 1668 samples, 89.57% of the samples were successfully attacked (Attack success rate: 89.57%), with an average of 19.24 words per sample (Average num. words per input: 19.24), and an average of 19.24 words per sample. Words account for 20.83% (Average perturbed word %: 20.83%), and each sample is perturbed 111.86 times on average (Avg num queries: 111.86). Under attack, the proxy model's accuracy is 6.91% (Accuracy under attack: 6.91%).

Robustness Metrics（RB-index）

For the original data set and different perturbation data sets we have $A c c_{o r g}$ ， $A c c_{d i s t 1}$ ， $A c c_{d i s t 2}$ ， $A c c_{d i s t 3}$ ， $. . .$ ， $A c c_{d i s t T}$ ( $A c c$ refers to the evaluation index of the model under this data set, $o r g$ refers to the original data set, and $d i s 1. . . T$ refers to different perturbation data sets).

The calculation formula of the robustness index on this data set is:

$R o b u s t n e s s = \frac{1}{T * A c c_{o r g}} Σ_{i = 1}^{T} (A c c_{o r g} - A c c_{d i s t i})$

Smaller values of the robustness metric indicate better model robustness and can be negative (mostly found in NLP)

Introduction of Robustness ​

Datasets ​

OCNLI ​

Robustness Metrics（RB-index） ​

Introduction of Robustness

Datasets

OCNLI

Robustness Metrics（RB-index）