Skip to content

鲁棒性评测介绍

鲁棒性是指模型在面对不同类型的异常、噪声、干扰、变化或恶意攻击时,能够保持稳定性和高效性的能力。抽象来看,目前的基础模型(包括基于学习的深度学习模型)在给定数据输入X的情况下,该参数化模型Fθ()经过其定义的计算,得到模型期望的输出Y。通常鲁棒性可以理解为模型在有噪音的情况下,是否能给出正确的输出。具体来说,在给定扰动噪音ΔX的情况下,模型的输出Fθ(X)是否等于期望的输出Y,我们量化该差异为ΔY。此外,构建的扰动噪音要求其不影响人对X的理解。因此,在构建文本噪音时,评估所生成的测试样例会设计ΔX,使X+ΔX与原始的X在人的理解上差异不大,但又容易使模型的输出犯错。

我们通过对实例进行扰动评估模型的鲁棒性。具体来说,我们对数据集进行不同程度的扰动,主要包括两个层面,一个是常见的现实世界中人类会犯的错误,分为三个级别:字符级别、单次级别、句子级别。字符级别包括相似字符的替换,键盘邻近字符的替换,单词级别则是词语的同义词替换以及代理模型语义空间单词的替换,句子级别主要是语言的回译。另外一个是针对性的扰动,例如采用代理模型进行对抗性的攻击。在进行上述的扰动后对于不同的原始数据集我们生成了不同的扰动数据集,通过评估模型在扰动数据集上的评测结果来计算模型在该数据集上的鲁棒性指标。

评测数据集

HumanEval

鲁棒性数据集的构建不采用代理模型评估扰动结果,扰动分类字符级别,单词级别,句子级别。具体对数据集中的prompt字段进行扰动,将prompt字段中的代码描述部分("""代码描述>>>)提取出来进行扰动。

扰动后数据集名称如下:

扰动数据集名称扰动方法
C-keyboarddisturbance-char-keyboard
C-ocrdisturbance-char-ocr
C-morphonymdisturbance-char-morphonym
W-synonymdisturbance-word-synonym
W-wordembeddingdisturbance-word-word-embedding
W-maskedlmdisturbance-word-masked-lm
S-backtranslationdisturbance-sentence-back-translation
Advadversarial

C、W、S、Adv分别是Char、Word、Sentence、adversarial的缩写

  • 字符级别

    随机挑选1到3个单词,每个单词挑选1到2个字符进行替换,扰动方式如下

    • ocr(o—>0)

      ocr humaneval扰动数据集样例

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in given list of numbers, ake any tw0 numbers closer to each other than 		given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
    • keyboard(q—>w)

      keyboard humaneval扰动数据集样例

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in given list of numbers, are any two numbers closer to each oFher thqn 		giFen threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
  • 单词级别

    随机挑选1到3个单词进行替换,扰动方式如下

    • word_embedding(采用glove6B-300d模型将挑选单词替换为语义相似的单词)

      word_embedding humaneval扰动数据集样例

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in good list of numbers, are any between numbers closer to each most than 		  given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
    • synonym(将挑选单词替换为同义词)

      synonym humaneval扰动数据集样例

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Condition if in sacrifice list of numbers, exist any two numbers closer to each    		  other than given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }
  • 句子级别

    采用语言模型对句子进行扰动

    • back_translation(使用Helsinki-NLP/opus-mt-ROMANCE-en以及Helsinki-NLP/opus-mt-en-ROMANCE模型将代码描述部分的句子翻译到另一种语言再翻译回来)

      back_translation humaneval扰动数据集样例

      {
          "task_id": "HumanEval/0",
          "prompt": 
          "	from typing import List    
          	def has_close_elements(numbers: List[float], threshold: float) -> bool:
                  """Check if in the given list of numbers, there are two numbers closer to each other           than the given threshold.
                  >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
                      False
                  >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
                      True
                  """
          ",
          "canonical_solution":
          "    
          	for idx, elem in enumerate(numbers):
                  for idx2, elem2 in enumerate(numbers):
                      if idx != idx2:
                          distance = abs(elem - elem2)
                      if distance < threshold:
                          return True
                      return False
          ",
          "test": 
          "   
          METADATA = {
              'author': 'jt',
              'dataset': 'test'
          }       
          def check(candidate):
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
              assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True
              assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False
              assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True
              assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False            
          ",
          "entry_point":"has_close_elements"
      }

鲁棒性指标

针对原始数据集以及不同的扰动数据集我们有AccorgAccdist1Accdist2Accdist3...AccdistTAcc指模型在该数据集下的评测指标,org指原始数据集,dis1...T指不同的扰动数据集)。

在该数据集上的鲁棒性指标计算公式为:

Robustness=1TAccorgΣi=1T(AccorgAccdisti)

鲁棒性指标数值越小说明模型鲁棒性越好,可以为负数(多在NLP中出现)