Skip to content

ColorBench

#Accuracy

Data Description

ColorBench is a multimodal dataset designed to comprehensively evaluate the capabilities of Vision-Language Models (VLMs) in color understanding, including color perception, reasoning, and robustness. This dataset was introduced in the paper titled "ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness."

The ColorBench dataset consists of 5,814 image-question pairs, covering diverse application scenarios and real-world challenges for VLM evaluation. It includes 3 major categories and 11 fine-grained tasks, specifically designed to assess VLMs' color-related comprehension abilities.

The 11 fine-grained tasks are:

  • Color Counting: 102 questions
  • Color Proportion: 81 questions
  • Color Recognition: 76 questions
  • Color Robustness: 4,858 questions
  • Color Comparison: 101 questions
  • Object Recognition: 77 questions
  • Color Mimicry: 70 questions
  • Color Extraction: 96 questions
  • Color Blindness Test: 157 questions
  • Color Illusion: 93 questions
  • Object Counting: 103 questions

Dataset Structure

Amount of source data:

  • Test set: 5,814 items

Amount of Evaluation data:
The evaluation dataset consists of 956 items from the original test set corresponding to color perception and reasoning abilities.

Data Detail

KEYEXPLANATION
idxData ID
idID within the category
typeEvaluation category
taskTask name
filenameImage filename
imageImage (PIL format)
promptPrompt
questionQuestion
choicesOptions
answerAnswer
image_urlOriginal image URL

Sample of Source Dataset

json
{  
  'idx': 0,  
  'id': 1,  
  'type': 'Perception',  
  'task': 'Object Recognition',  
  'filename': 'ObjectRecognition/1.jpg',  
  'image': <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1682x1072 at 0x7F6762778C10>,  
  'prompt': 'which state is not light pink in this image? Select from the following choices. (A) ID (B) OK (C) TX (D) MO',  
  'question': 'Which state is not light pink in this image?',  
  'choices': ['ID', 'OK', 'TX', 'MO'],  
  'answer': '(B)',  
  'image_url': 'https://www.shutterstock.com/image-vector/grunge-watercolor-map-usa-united-states-2169490363'  
}

Citation Information

bibtex
@article{liang2025colorbench,  
  title={Colorbench: Can vlms see and understand the colorful world? a comprehensive benchmark for color perception, reasoning, and robustness},  
  author={Liang, Yijun and Li, Ming and Fan, Chenrui and Li, Ziyue and Nguyen, Dang and Cobbina, Kwesi and Bhardwaj, Shweta and Chen, Jiuhai and Liu, Fuxiao and Zhou, Tianyi},  
  journal={arXiv preprint arXiv:2504.10514},  
  year={2025}  
}

Licensing Information

Apache License 2.0