Skip to content

Introduction of Robustness

Robustness refers to the ability of a model to maintain stability and efficiency in the face of different types of anomalies, noise, interference, changes, or malicious attacks. In abstract, the current basic model (including learning-based deep learning models) gives a data input X, and the parametric model Fθ() goes through its defined calculations to obtain the expected output Y of the model. Robustness can usually be understood as whether the model can give the correct output in the presence of noise. Specifically, given the disturbance noise ΔX, whether the model's output Fθ(X) is equal to the expected output Y, we quantify the difference as DeltaY. In addition, the disturbance noise of the construction requires that it does not affect the human understanding of X. Therefore, when constructing text noise, the test sample generated by the evaluation will design ΔX, so that X+ΔX and the original X are not much different in human understanding, but it is easy to make mistakes in the output of the model.

We evaluate the robustness of the model by perturbing the instances. Specifically, we perturb the data set to varying degrees, which mainly includes two levels. One is common mistakes made by humans in the real world, which is divided into three levels: character level, single level, and sentence level. The character level includes the replacement of similar characters, the replacement of adjacent characters on the keyboard, the word level is the replacement of synonyms of words and the replacement of words in the semantic space of the agent model, and the sentence level is mainly the back translation of language. The other is targeted perturbation, such as using agent models to conduct adversarial attacks. After performing the above perturbation, we generated different perturbation data sets for different original data sets, and calculated the model's robustness index on the data set by evaluating the evaluation results of the model on the perturbation data set.

Datasets

HumanEval

The robust dataset is constructed without using a proxy model to evaluate the perturbation results, and the perturbation classifies the character level, word level, and sentence level. Specifically, the prompt field in the data set is disturbed, and the code description part ("" Code description >>>) in the prompt field is extracted for perturbation.

The name of the disturbed datasets are as follows:

disturbance dataset namedisturbance methods
C-keyboarddisturbance-char-keyboard
C-ocrdisturbance-char-ocr
C-morphonymdisturbance-char-morphonym
W-synonymdisturbance-word-synonym
W-wordembeddingdisturbance-word-word-embedding
W-maskedlmdisturbance-word-masked-lm
S-backtranslationdisturbance-sentence-back-translation
Advadversarial

C、W、S、Adv , short of Char、Word、Sentence、adversarial

  • char level

    Pick 1 to 3 words at random, and choose 1 to 2 characters for each word to replace, and perturb as follows

    • ocr(o—>0)

      ocr humaneval perturbed dataset example

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in given list of numbers, ake any tw0 numbers closer to each other than 		given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
    • keyboard(q—>w)

      keyboard humaneval perturbed dataset example

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in given list of numbers, are any two numbers closer to each oFher thqn 		giFen threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
  • word level

    hoose 1 to 3 words at random to replace, perturbing as follows

    • word_embedding(The glove6B-300d model was used to replace selected words with semantically similar words)

      word_embedding humaneval perturbed dataset example

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in good list of numbers, are any between numbers closer to each most than 		  given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
    • synonym(Replace selected words with synonyms)

      synonym humaneval perturbed dataset example

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Condition if in sacrifice list of numbers, exist any two numbers closer to each    		  other than given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
  • sentence level

    The language model is used to perturb the sentences

    • back_translation(Use the Helsinki-NLP/opus-mt-ROMANCE-en and Helsinki-NLP/opus-mt-en-ROMANCE models to translate sentences in the description part of the code into another language and back again)

      back_translation humaneval perturbed dataset example

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in the given list of numbers, there are two numbers closer to each other           than the given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }

Robustness Metrics(RB-index)

For the original data set and different perturbation data sets we have AccorgAccdist1Accdist2Accdist3...AccdistT ( Acc refers to the evaluation index of the model under this data set, org refers to the original data set, and dis1...T refers to different perturbation data sets).

The calculation formula of the robustness index on this data set is:

Robustness=1TAccorgΣi=1T(AccorgAccdisti)

Smaller values of the robustness metric indicate better model robustness and can be negative (mostly found in NLP)