Introduction of Robustness
Robustness refers to the ability of a model to maintain stability and efficiency in the face of different types of anomalies, noise, interference, changes, or malicious attacks. In abstract, the current basic model (including learning-based deep learning models) gives a data input
We evaluate the robustness of the model by perturbing the instances. Specifically, we perturb the data set to varying degrees, which mainly includes two levels. One is common mistakes made by humans in the real world, which is divided into three levels: character level, single level, and sentence level. The character level includes the replacement of similar characters, the replacement of adjacent characters on the keyboard, the word level is the replacement of synonyms of words and the replacement of words in the semantic space of the agent model, and the sentence level is mainly the back translation of language. The other is targeted perturbation, such as using agent models to conduct adversarial attacks. After performing the above perturbation, we generated different perturbation data sets for different original data sets, and calculated the model's robustness index on the data set by evaluating the evaluation results of the model on the perturbation data set.
Datasets
HumanEval
The robust dataset is constructed without using a proxy model to evaluate the perturbation results, and the perturbation classifies the character level, word level, and sentence level. Specifically, the prompt
field in the data set is disturbed, and the code description part ("" Code description >>>) in the prompt
field is extracted for perturbation.
The name of the disturbed datasets are as follows:
disturbance dataset name | disturbance methods |
---|---|
C-keyboard | disturbance-char-keyboard |
C-ocr | disturbance-char-ocr |
C-morphonym | disturbance-char-morphonym |
W-synonym | disturbance-word-synonym |
W-wordembedding | disturbance-word-word-embedding |
W-maskedlm | disturbance-word-masked-lm |
S-backtranslation | disturbance-sentence-back-translation |
Adv | adversarial |
C、W、S、Adv , short of Char、Word、Sentence、adversarial
char level
Pick 1 to 3 words at random, and choose 1 to 2 characters for each word to replace, and perturb as follows
ocr(o—>0)
ocr humaneval perturbed dataset example
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in given list of numbers, ake any tw0 numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
keyboard(q—>w)
keyboard humaneval perturbed dataset example
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in given list of numbers, are any two numbers closer to each oFher thqn giFen threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
word level
hoose 1 to 3 words at random to replace, perturbing as follows
word_embedding(The glove6B-300d model was used to replace selected words with semantically similar words)
word_embedding humaneval perturbed dataset example
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in good list of numbers, are any between numbers closer to each most than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
synonym(Replace selected words with synonyms)
synonym humaneval perturbed dataset example
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Condition if in sacrifice list of numbers, exist any two numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
sentence level
The language model is used to perturb the sentences
back_translation(Use the Helsinki-NLP/opus-mt-ROMANCE-en and Helsinki-NLP/opus-mt-en-ROMANCE models to translate sentences in the description part of the code into another language and back again)
back_translation humaneval perturbed dataset example
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in the given list of numbers, there are two numbers closer to each other than the given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
Robustness Metrics(RB-index)
For the original data set and different perturbation data sets we have
The calculation formula of the robustness index on this data set is:
Smaller values of the robustness metric indicate better model robustness and can be negative (mostly found in NLP)