鲁棒性评测介绍
鲁棒性是指模型在面对不同类型的异常、噪声、干扰、变化或恶意攻击时,能够保持稳定性和高效性的能力。抽象来看,目前的基础模型(包括基于学习的深度学习模型)在给定数据输入
我们通过对实例进行扰动评估模型的鲁棒性。具体来说,我们对数据集进行不同程度的扰动,主要包括两个层面,一个是常见的现实世界中人类会犯的错误,分为三个级别:字符级别、单次级别、句子级别。字符级别包括相似字符的替换,键盘邻近字符的替换,单词级别则是词语的同义词替换以及代理模型语义空间单词的替换,句子级别主要是语言的回译。另外一个是针对性的扰动,例如采用代理模型进行对抗性的攻击。在进行上述的扰动后对于不同的原始数据集我们生成了不同的扰动数据集,通过评估模型在扰动数据集上的评测结果来计算模型在该数据集上的鲁棒性指标。
评测数据集
HumanEval
鲁棒性数据集的构建不采用代理模型评估扰动结果,扰动分类字符级别,单词级别,句子级别。具体对数据集中的prompt
字段进行扰动,将prompt
字段中的代码描述部分("""代码描述>>>)提取出来进行扰动。
扰动后数据集名称如下:
扰动数据集名称 | 扰动方法 |
---|---|
C-keyboard | disturbance-char-keyboard |
C-ocr | disturbance-char-ocr |
C-morphonym | disturbance-char-morphonym |
W-synonym | disturbance-word-synonym |
W-wordembedding | disturbance-word-word-embedding |
W-maskedlm | disturbance-word-masked-lm |
S-backtranslation | disturbance-sentence-back-translation |
Adv | adversarial |
C、W、S、Adv分别是Char、Word、Sentence、adversarial的缩写
字符级别
随机挑选1到3个单词,每个单词挑选1到2个字符进行替换,扰动方式如下
ocr(o—>0)
ocr humaneval扰动数据集样例
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in given list of numbers, ake any tw0 numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
keyboard(q—>w)
keyboard humaneval扰动数据集样例
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in given list of numbers, are any two numbers closer to each oFher thqn giFen threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
单词级别
随机挑选1到3个单词进行替换,扰动方式如下
word_embedding(采用glove6B-300d模型将挑选单词替换为语义相似的单词)
word_embedding humaneval扰动数据集样例
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in good list of numbers, are any between numbers closer to each most than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
synonym(将挑选单词替换为同义词)
synonym humaneval扰动数据集样例
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Condition if in sacrifice list of numbers, exist any two numbers closer to each other than given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
句子级别
采用语言模型对句子进行扰动
back_translation(使用Helsinki-NLP/opus-mt-ROMANCE-en以及Helsinki-NLP/opus-mt-en-ROMANCE模型将代码描述部分的句子翻译到另一种语言再翻译回来)
back_translation humaneval扰动数据集样例
{ "task_id": "HumanEval/0", "prompt": " from typing import List def has_close_elements(numbers: List[float], threshold: float) -> bool: """Check if in the given list of numbers, there are two numbers closer to each other than the given threshold. >>> has_close_elements([1.0, 2.0, 3.0], 0.5) False >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) True """ ", "canonical_solution": " for idx, elem in enumerate(numbers): for idx2, elem2 in enumerate(numbers): if idx != idx2: distance = abs(elem - elem2) if distance < threshold: return True return False ", "test": " METADATA = { 'author': 'jt', 'dataset': 'test' } def check(candidate): assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False ", "entry_point":"has_close_elements" }
鲁棒性指标
针对原始数据集以及不同的扰动数据集我们有
在该数据集上的鲁棒性指标计算公式为:
鲁棒性指标数值越小说明模型鲁棒性越好,可以为负数(多在NLP中出现)