Skip to content

#准确率-Accuracy

ColorBench

数据描述

ColorBench是一个多模态数据集,旨在全面评估视觉语言模型(VLMs)在颜色理解方面的能力,包括颜色感知、推理和鲁棒性。该数据集在论文《ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness》中提出。

  • 数据规模:共包含5814个图像-问题对
  • 评测范围:涵盖3大类别、11项细分任务
  • 核心能力:专注评估视觉语言模型的色彩理解能力

11项细分任务

任务名称题量任务名称题量
颜色计数(Color Counting)102颜色鲁棒性(Color Robustness)4858
颜色比例(Color Proportion)81颜色对比(Color Comparison)101
颜色识别(Color Recognition)76物体识别(Object Recognition)77
颜色模仿(Color Mimicry)70颜色提取(Color Extraction)96
色盲测试(Color Blindness)157颜色错觉(Color Illusion)93
物体计数(Object Counting)103

数据集构成和规范

源数据量

  • 测试集:5814题

评测数据量

  • 颜色感知和推理能力子集:956题(源自原始测试集)

源数据字段

KEY说明
idx全局唯一ID
id类别内ID
type评测类别(如Perception)
task任务名称(如Object Recognition)
filename图片文件名
imagePIL格式图片对象
prompt完整提示(含选项)
question问题文本
choices选项列表
answer正确答案
image_url原图网络链接

源数据集样例

json
{
  "idx": 0,
  "id": 1,
  "type": "Perception",
  "task": "Object Recognition",
  "filename": "ObjectRecognition/1.jpg",
  "image": "<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1682x1072>",
  "prompt": "which state is not light pink in this image? Select from the following choices. (A) ID (B) OK (C) TX (D) MO",
  "question": "Which state is not light pink in this image?",
  "choices": ["ID", "OK", "TX", "MO"],
  "answer": "(B)",
  "image_url": "https://www.shutterstock.com/image-vector/grunge-watercolor-map-usa-united-states-2169490363"
}

论文引用

bibtex
@article{liang2025colorbench,
  title={Colorbench: Can vlms see and understand the colorful world? a comprehensive benchmark for color perception, reasoning, and robustness},
  author={Liang, Yijun and Li, Ming and Fan, Chenrui and Li, Ziyue and Nguyen, Dang and Cobbina, Kwesi and Bhardwaj, Shweta and Chen, Jiuhai and Liu, Fuxiao and Zhou, Tianyi},
  journal={arXiv preprint arXiv:2504.10514},
  year={2025}
}

源数据集版权:

Apache License 2.0