评测数据

ColorBench

数据描述

ColorBench是一个多模态数据集，旨在全面评估视觉语言模型（VLMs）在颜色理解方面的能力，包括颜色感知、推理和鲁棒性。该数据集在论文《ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness》中提出。

数据规模：共包含5814个图像-问题对
评测范围：涵盖3大类别、11项细分任务
核心能力：专注评估视觉语言模型的色彩理解能力

11项细分任务：

任务名称	题量	任务名称	题量
颜色计数（Color Counting）	102	颜色鲁棒性（Color Robustness）	4858
颜色比例（Color Proportion）	81	颜色对比（Color Comparison）	101
颜色识别（Color Recognition）	76	物体识别（Object Recognition）	77
颜色模仿（Color Mimicry）	70	颜色提取（Color Extraction）	96
色盲测试（Color Blindness）	157	颜色错觉（Color Illusion）	93
物体计数（Object Counting）	103

数据集构成和规范

源数据量：

测试集：5814题

评测数据量：

颜色感知和推理能力子集：956题（源自原始测试集）

源数据字段：

KEY	说明
`idx`	全局唯一ID
`id`	类别内ID
`type`	评测类别（如Perception）
`task`	任务名称（如Object Recognition）
`filename`	图片文件名
`image`	PIL格式图片对象
`prompt`	完整提示（含选项）
`question`	问题文本
`choices`	选项列表
`answer`	正确答案
`image_url`	原图网络链接

源数据集样例：

json

{
  "idx": 0,
  "id": 1,
  "type": "Perception",
  "task": "Object Recognition",
  "filename": "ObjectRecognition/1.jpg",
  "image": "<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1682x1072>",
  "prompt": "which state is not light pink in this image? Select from the following choices. (A) ID (B) OK (C) TX (D) MO",
  "question": "Which state is not light pink in this image?",
  "choices": ["ID", "OK", "TX", "MO"],
  "answer": "(B)",
  "image_url": "https://www.shutterstock.com/image-vector/grunge-watercolor-map-usa-united-states-2169490363"
}

论文引用

bibtex

@article{liang2025colorbench,
  title={Colorbench: Can vlms see and understand the colorful world? a comprehensive benchmark for color perception, reasoning, and robustness},
  author={Liang, Yijun and Li, Ming and Fan, Chenrui and Li, Ziyue and Nguyen, Dang and Cobbina, Kwesi and Bhardwaj, Shweta and Chen, Jiuhai and Liu, Fuxiao and Zhou, Tianyi},
  journal={arXiv preprint arXiv:2504.10514},
  year={2025}
}

源数据集版权：

Apache License 2.0

评测数据 ​

ColorBench ​

数据描述 ​

数据集构成和规范 ​