评测数据
MS-COCO
数据描述:
MS-COCO的全称是Microsoft Common Objects in Context,起源于微软于2014年出资标注的Microsoft COCO数据集。COCO数据集涵盖了150万个对象实例,80个目标类别以及91个物体类别,用于目标检测、分割、文本生成图像、图像描述等等场景
数据集构成和规范:
源数据量:
数据集分成训练集(118287),验证集(5000),测试集(40670),每张图像有5个对应的文本描述
评测数据量:
评测数据为源数据测试集中的40670张图像以及对应的文本描述
源数据字段:
KEYS | EXPLAIN |
---|---|
img | 图像 |
texts | 对应的文本 |
源数据集样例:
img: texts:
- A red hair woman holding an open box of pizza.
- A young woman holding a pizza in a box.
- a woman is holding a box of pizza.
- A woman is posing with an open pizza box.
- A woman holds an open box of pizza.
源数据集版权使用说明:
Creative Commons Attribution 4.0 License
论文引用:
{MS-COCO,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi, et al.},
year={2014},
howpublished={ECCV 2014},
}
CUB
数据描述:
CUB-200数据集全称为Caltech-UCSD Birds-200-2011数据集,是由加利福尼亚理工学院提供的鸟类数据库,共包含200种鸟类的11,788张图像。使用中通常划分为训练集(100种),验证集(50种)和测试集(50种)。
数据集构成和规范:
源数据量:
数据集分成训练集(8855),测试集(2933),每张图像有10个对应的文本描述
评测数据量:
评测数据为源数据测试集中的2933张图像以及对应的文本描述
源数据字段:
KEYS | EXPLAIN |
---|---|
img | 图像 |
texts | 对应的文本 |
源数据集样例:
img:
texts:
- this small blue bird has a white bill and black legs.
- this bird has a short white bill along with a vibrant blue belly, and fluffy blue breast.
- a small sized bird that is mostly blue and has a short thick bill
- small, but wide bird with a small beak and an almost non existent head, all blue body.
- small chubby bird with a blue body, and bluish green wings and tail
- this bird is blue with black and has a very short beak.
- the small bird is blue in color with a small grey beak.
- this bird is vivid blue and black in color, with a stubby multi colored beak.
- a small bird that is blue, has narrow legs, a long tail, and a short beak that curves downward.
- this bird has wings that are black and has a blue belly
论文引用:
@techreport{WahCUB_200_2011,
Title = ,
Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},
Year = {2011}
Institution = {California Institute of Technology},
Number = {CNS-TR-2011-001}
}
Oxford-102 Flower
数据描述:
Oxford-102 Flower是牛津工程大学于2008年发布的花卉数据集,选择的花通常在英国本土,总共包含102种类别的花卉。
数据集构成和规范:
源数据量:
数据集分成训练集(6149),测试集(2040),每张图像有10个对应的文本描述
评测数据量:
评测数据为源数据测试集中的2040张图像以及对应的文本描述
源数据字段:
KEYS | EXPLAIN |
---|---|
img | 图像 |
texts | 对应的文本 |
源数据集样例:
img:
texts:
- the petals of the flower are pink in color and have a yellow center.
- this flower is pink and white in color, with petals that are multi colored.
- the geographical shapes of the bright purple petals set off the orange stamen and filament and the cross shaped stigma is beautiful.
- the purple petals have shades of white with white anther and filament
- this flower has large pink petals and a white stigma in the center
- this flower has petals that are pink and has a yellow stamen
- a flower with short and wide petals that is light purple.
- this flower has small pink petals with a yellow center.
- this flower has large rounded pink petals with curved edges and purple veins.
- this flower has purple petals as well as a white stamen.
论文引用:
@inproceedings{nilsback2008automated,
title={Automated flower classification over a large number of classes},
author={Nilsback, Maria-Elena and Zisserman, Andrew},
booktitle={2008 Sixth Indian conference on computer vision, graphics \& image processing},
pages={722--729},
year={2008},
organization={IEEE}
}
mg18
数据描述
这是一个用于评估多语言图像生成质量的数据集,包含18种语言的7000个高质量图像-文本对。这个数据集通过扩展XM-3600数据集并结合WIT数据集中的高质量图像来构建。它用于评估模型在生成通用图像方面的能力。
数据集构成
数据集选取了中英文两种,每种语言各2500个prompt
源数据集版权使用说明:
apache-2.0
论文引用
@misc{ye2023altdiffusion, title={AltDiffusion: A Multilingual Text-to-Image Diffusion Model}, author={Fulong Ye and Guang Liu and Xinya Wu and Ledell Wu}, year={2023}, eprint={2308.09991}, archivePrefix={arXiv}, primaryClass={cs.CV} }
Image-gen-v1.0
评测指标:人工从图文一致性,图像质量和安全性三个方面进行主观评价。
数据描述
由智源全新命制的文生图评测数据集,共414条prompts,以中文和英文为主。在prompt的设计上,覆盖了各类实体(任务,动植物,风景,天气等),属性(颜色,情绪,氛围等),风格(写实,动漫,摄影等),以及一些需要推理能和复杂文本理解能力的内容。力求从不同维度进行全方位的评测。
CelebA-HQ
数据描述:
CelebA,指CelebFaces Attribute,即名人面部属性数据集。它包含10,177位名人身份的202,599张面部图像。CelebA由香港中文大学公开提供,广泛用于与人脸相关的计算机视觉训练任务。
数据集构成和规范:
源数据量:
训练集(24183),验证集(2993),测试集(2824)。每张图有10条标题。
数据字段:
KEYS | EXPLAIN |
---|---|
img | 图像 |
texts | 图像标题 |
源数据集样例:
img:
texts:
- The person has pale skin, wavy hair, black hair, pointy nose, high cheekbones, big lips, and arched eyebrows and is wearing heavy makeup.
- This attractive person has wavy hair, and big nose.
- This person has black hair, wavy hair, arched eyebrows, pointy nose, pale skin, big nose, and big lips. She is attractive, and young and is wearing heavy makeup, and lipstick.
- The woman wears lipstick. She has big nose, high cheekbones, arched eyebrows, wavy hair, and big lips. She is smiling, and young.
- She wears earrings. She has big nose, and pointy nose. She is smiling.
- This attractive person has pale skin.
- She is wearing lipstick, and earrings. She is attractive, and smiling and has arched eyebrows, wavy hair, big lips, high cheekbones, big nose, and black hair.
- This smiling, and young woman has pointy nose.
- This woman is attractive and has wavy hair, high cheekbones, black hair, arched eyebrows, big nose, big lips, and pointy nose.
- This person has black hair, high cheekbones, wavy hair, big lips, pointy nose, and pale skin and is wearing heavy makeup.
论文引用:
@inproceedings{liu2015faceattributes,
title = {Deep Learning Face Attributes in the Wild},
author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
year = {2015}
}
@inproceedings{karras2017progressive,
title={Progressive growing of gans for improved quality, stability, and variation},
author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
journal={International Conference on Learning Representations (ICLR)},
year={2018}
}
数据集版权使用说明:
本软件的使用仅限于非商业研究和教育目的。
MSR-VTT
数据描述:
MSR-VTT,指Microsoft Research Video to Text,是一个包含视频和相应文本注释的大规模数据集。它包含20个类别的10,000个视频片段。每个视频片段都包含20个英文句子注释。
数据集构成和规范:
源数据量:
训练集(6513),验证集(497),测试集(2990)。每个视频有20条标题。
数据字段:
KEYS | EXPLAIN |
---|---|
vid | 视频 |
texts | 视频标题 |
源数据集样例:
vid:
texts:
- a baker is demonstrating a cooking technique
- a female giving a baking demonstration in her kitchen
- a girl explaining to prepare a dish
- a lady with a scarf is cooking with dough
- a person is preparing some food
- a person making pastries
- a woman is making a pastry
- a woman is rolling doe
- a woman is rolling dough around a stick
- a woman is rolling dough
- a woman is rolling dough
- a woman is wrapping dough around some food item
- a woman rolling up pastry while giving instructions
- a woman rolls dough
- a woman showing an easy way to make crescent rolls
- how to prepare food rolls
- the pastry should have five creases
- a person is preparing some food
- a woman is rolling dough around a stick
- a woman rolls dough
论文引用:
@inproceedings{xu2016msr-vtt,
author = {Xu, Jun and Mei, Tao and Yao, Ting and Rui, Yong},
title = {MSR-VTT: A Large Video Description Dataset for Bridging Video and Language},
year = {2016},
month = {June},
publisher = {IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
}
UCF-101
数据描述:
UCF101是一个视频数据集,包含由中佛罗里达大学收集的YouTube上的101个动作类别的13,320个视频。
数据集构成和规范:
源数据量:
训练集(9537),测试集(3783)
数据字段:
KEYS | EXPLAIN |
---|---|
vid | 视频 |
labels | 视频标注 |
源数据集样例:
vid:
label:
Playing Basketball
论文引用:
@article{soomro2012ucf101,
title={UCF101: A dataset of 101 human actions classes from videos in the wild},
author={Soomro, Khurram and Zamir, Amir Roshan and Shah, Mubarak},
journal={arXiv preprint arXiv:1212.0402},
year={2012}
}