#准确率-Accuracy
ColorBench
数据描述
ColorBench是一个多模态数据集,旨在全面评估视觉语言模型(VLMs)在颜色理解方面的能力,包括颜色感知、推理和鲁棒性。该数据集在论文《ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness》中提出。
- 数据规模:共包含5814个图像-问题对
- 评测范围:涵盖3大类别、11项细分任务
- 核心能力:专注评估视觉语言模型的色彩理解能力
11项细分任务:
任务名称 | 题量 | 任务名称 | 题量 |
---|---|---|---|
颜色计数(Color Counting) | 102 | 颜色鲁棒性(Color Robustness) | 4858 |
颜色比例(Color Proportion) | 81 | 颜色对比(Color Comparison) | 101 |
颜色识别(Color Recognition) | 76 | 物体识别(Object Recognition) | 77 |
颜色模仿(Color Mimicry) | 70 | 颜色提取(Color Extraction) | 96 |
色盲测试(Color Blindness) | 157 | 颜色错觉(Color Illusion) | 93 |
物体计数(Object Counting) | 103 |
数据集构成和规范
源数据量:
- 测试集:5814题
评测数据量:
- 颜色感知和推理能力子集:956题(源自原始测试集)
源数据字段:
KEY | 说明 |
---|---|
idx | 全局唯一ID |
id | 类别内ID |
type | 评测类别(如Perception) |
task | 任务名称(如Object Recognition) |
filename | 图片文件名 |
image | PIL格式图片对象 |
prompt | 完整提示(含选项) |
question | 问题文本 |
choices | 选项列表 |
answer | 正确答案 |
image_url | 原图网络链接 |
源数据集样例:
json
{
"idx": 0,
"id": 1,
"type": "Perception",
"task": "Object Recognition",
"filename": "ObjectRecognition/1.jpg",
"image": "<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1682x1072>",
"prompt": "which state is not light pink in this image? Select from the following choices. (A) ID (B) OK (C) TX (D) MO",
"question": "Which state is not light pink in this image?",
"choices": ["ID", "OK", "TX", "MO"],
"answer": "(B)",
"image_url": "https://www.shutterstock.com/image-vector/grunge-watercolor-map-usa-united-states-2169490363"
}
论文引用
bibtex
@article{liang2025colorbench,
title={Colorbench: Can vlms see and understand the colorful world? a comprehensive benchmark for color perception, reasoning, and robustness},
author={Liang, Yijun and Li, Ming and Fan, Chenrui and Li, Ziyue and Nguyen, Dang and Cobbina, Kwesi and Bhardwaj, Shweta and Chen, Jiuhai and Liu, Fuxiao and Zhou, Tianyi},
journal={arXiv preprint arXiv:2504.10514},
year={2025}
}
源数据集版权:
Apache License 2.0