评测数据
以下数据集均转化为标准评测Prompt后再进行评测。
数据集 1(RefCOCO)
#评测指标-1. 精度(Precision or Accuracy)
数据描述:
RefCOCO是一个指代表达/视觉定位数据集,图像来自MS COCO网站的其中一部分,区域图像和对应短语(指代表达)来自人工标注。RefCOCO是一个小型的视觉-语言多模态训练与测试基准数据集,可以用于评测指代表达/视觉定位等任务,图像内容多来自生活场景,短语(指代表达)描述通常是对于图像区域内容的直观描述。
源数据量:
数据集分成训练集(120624),验证集(10834),测试集A(5657),测试集B(5095)
评测数据量:
评测数据为源数据测试集A中的5657个区域图像和对应短语(指代表达)实例,以及测试集B中的5095个区域图像和对应短语(指代表达)实例
源数据字段:
KEYS | EXPLAIN |
---|---|
'images' | |
id | 图像id |
file_name | 图像文件名 |
width | 图像宽度 |
height | 图像高度 |
coco_url | 图像在coco数据集的url |
flickr_url | 图像在flickr数据集的url |
license | 图像许可类型 |
'annotations' | |
id | 图像区域id |
image_id | 图像区域所属图像id |
category_id | 图像区域的分类id |
bbox | 图像区域的矩形包围框 |
segmentation | 图像区域的多边形边缘分割 |
area | 图像区域的分割面积 |
iscrowd | 图像区域分割是多目标群体 |
'references' | |
ref_id | 指代表达id |
ann_id | 指代表达所属图像区域id |
split | 指代表达的数据集划分类型 |
sent_ids | 多个指代表达的id列表 |
sentences | 多个指代表达的具体内容 |
源数据集样例:
import json as jsonmod import pickle refcoco = jsonmod.load(open('./refcoco/instances.json', 'r')) refcoco_p = pickle.load(open('./refcoco/refs(unc).p', 'rb'),fix_imports=True)
refcoco_p[0]
{'sent_ids': [0, 1, 2],
'file_name': 'COCO_train2014_000000581857_16.jpg',
'ann_id': 1719310,
'ref_id': 0,
'image_id': 581857,
'split': 'train',
'sentences':
[{'tokens': ['the', 'lady', 'with', 'the', 'blue', 'shirt'], 'raw': 'THE LADY WITH THE BLUE SHIRT', 'sent_id': 0, 'sent': 'the lady with the blue shirt'},
{'tokens': ['lady', 'with', 'back', 'to', 'us'], 'raw': 'lady w back to us', 'sent_id': 1, 'sent': 'lady with back to us'},
{'tokens': ['blue', 'shirt'], 'raw': 'blue shirt', 'sent_id': 2, 'sent': 'blue shirt'}
],
'category_id': 1
}
refcoco_p[50000-1]
{'sent_ids': [142208, 142209],
'file_name': 'COCO_train2014_000000000072_0.jpg',
'ann_id': 598731,
'ref_id': 49999,
'image_id': 72,
'split': 'train',
'sentences':
[{'tokens': ['right', 'giraffe'], 'raw': 'RIGHT GIRAFFE', 'sent_id': 142208, 'sent': 'right giraffe'},
{'tokens': ['right', 'girafe'], 'raw': 'right girafe', 'sent_id': 142209, 'sent': 'right girafe'}
],
'category_id': 25
}
refcoco['images'][0]
{'license': 1,
'file_name': 'COCO_train2014_000000098304.jpg',
'coco_url': 'http://mscoco.org/images/98304',
'height': 424,
'width': 640,
'date_captured': '2013-11-21 23:06:41',
'flickr_url': 'http://farm6.staticflickr.com/5062/5896644212_a326e96ea9_z.jpg',
'id': 98304
}
refcoco['images'][19994-1]
{'license': 6,
'file_name': 'COCO_train2014_000000458751.jpg',
'coco_url': 'http://mscoco.org/images/458751',
'height': 576,
'width': 592,
'date_captured': '2013-11-16 21:13:51',
'flickr_url': 'http://farm8.staticflickr.com/7018/6821165845_48ebd9590f_z.jpg',
'id': 458751
}
refcoco['annotations'][0]
{'segmentation': [[267.52, 229.75, 265.6, 226.68, 265.79, 223.6, 263.87, 220.15, 263.87, 216.88, 266.94, 217.07, 268.48, 221.3, 272.32, 219.95, 276.35, 220.15, 279.62, 218.03, 283.46, 218.42, 285.0, 220.92, 285.0, 223.22, 284.42, 224.95, 280.96, 225.14, 279.81, 226.48, 281.73, 228.41, 279.43, 229.37, 275.78, 229.17, 273.86, 229.56, 274.24, 232.05, 269.82, 231.67, 267.14, 231.48, 266.75, 228.6]],
'area': 197.29899999999986,
'iscrowd': 0,
'image_id': 98304,
'bbox': [263.87, 216.88, 21.13, 15.17],
'category_id': 18,
'id': 3007
}
refcoco['annotations'][196771-1]
{'segmentation': [[203.42, 96.23, 216.68, 104.44, 216.05, 114.54, 226.15, 118.96, 228.67, 132.21, 247.61, 138.52, 250.13, 156.83, 236.88, 159.35, 234.35, 167.56, 274.12, 168.19, 281.69, 185.87, 284.85, 213.01, 267.81, 237.62, 243.19, 236.36, 238.14, 223.74, 232.46, 232.57, 231.2, 284.33, 159.87, 283.07, 159.87, 218.06, 151.67, 206.7, 154.19, 190.92, 159.87, 184.6, 158.61, 166.3, 140.3, 153.04, 142.2, 144.84, 178.81, 147.99, 183.86, 142.94, 169.97, 125.9, 173.13, 114.54, 176.28, 113.91, 185.75, 96.87, 200.9, 94.97]],
'area': 16238.20485,
'iscrowd': 0,
'image_id': 458751,
'bbox': [140.3, 94.97, 144.55, 189.36],
'category_id': 11,
'id': 1808941
}
数据集构成和规范:
数据集 2(RefCOCO+)
#评测指标-1. 精度(Precision or Accuracy)
数据描述:
RefCOCO+是一个指代表达/视觉定位数据集,图像来自MS COCO网站的其中一部分,区域图像和对应短语(指代表达)来自人工标注。RefCOCO+是一个小型的视觉-语言多模态训练与测试基准数据集,可以用于评测指代表达/视觉定位等任务,图像内容多来自生活场景,短语(指代表达)描述通常是对于图像区域内容的直观描述。
源数据量:
数据集分成训练集(120191),验证集(10758),测试集A(5726),测试集B(4889)
评测数据量:
评测数据为源数据测试集A中的5726个区域图像和对应短语(指代表达)实例,以及测试集B中的4889个区域图像和对应短语(指代表达)实例
源数据字段:
KEYS | EXPLAIN |
---|---|
'images' | |
id | 图像id |
file_name | 图像文件名 |
width | 图像宽度 |
height | 图像高度 |
coco_url | 图像在coco数据集的url |
flickr_url | 图像在flickr数据集的url |
license | 图像许可类型 |
'annotations' | |
id | 图像区域id |
image_id | 图像区域所属图像id |
category_id | 图像区域的分类id |
bbox | 图像区域的矩形包围框 |
segmentation | 图像区域的多边形边缘分割 |
area | 图像区域的分割面积 |
iscrowd | 图像区域分割是多目标群体 |
'references' | |
ref_id | 指代表达id |
ann_id | 指代表达所属图像区域id |
split | 指代表达的数据集划分类型 |
sent_ids | 多个指代表达的id列表 |
sentences | 多个指代表达的具体内容 |
源数据集样例:
import json as jsonmod refcoco_plus = jsonmod.load(open('./refcoco+/instances.json', 'r')) refcoco_plus_p = pickle.load(open('./refcoco+/refs(unc).p', 'rb'),fix_imports=True)
refcoco_plus_p[0]
{'sent_ids': [0, 1, 2],
'file_name': 'COCO_train2014_000000581857_16.jpg',
'ann_id': 1719310,
'ref_id': 0,
'image_id': 581857,
'split': 'train',
'sentences': [
{'tokens': ['navy', 'blue', 'shirt'], 'raw': 'navy blue shirt', 'sent_id': 0, 'sent': 'navy blue shirt'},
{'tokens': ['woman', 'back', 'in', 'blue'], 'raw': 'woman back in blue', 'sent_id': 1, 'sent': 'woman back in blue'},
{'tokens': ['blue', 'shirt'], 'raw': 'blue shirt', 'sent_id': 2, 'sent': 'blue shirt'}
],
'category_id': 1
}
refcoco_plus_p[49856-1]
{'sent_ids': [141560, 141561, 141562, 141563],
'file_name': 'COCO_train2014_000000000072_0.jpg',
'ann_id': 598731,
'ref_id': 49855,
'image_id': 72,
'split': 'train',
'sentences': [
{'tokens': ['shorter', 'giraffe'], 'raw': 'shorter giraffe', 'sent_id': 141560, 'sent': 'shorter giraffe'},
{'tokens': ['giraffe', 'closest', 'to', 'camera'], 'raw': 'giraffe closest to camera', 'sent_id': 141561, 'sent': 'giraffe closest to camera'},
{'tokens': ['bent', 'neck'], 'raw': 'bent neck', 'sent_id': 141562, 'sent': 'bent neck'},
{'tokens': ['shorter', 'animal'], 'raw': 'shorter animal', 'sent_id': 141563, 'sent': 'shorter animal'}
],
'category_id': 25
}
refcoco_plus['images'][0]
{'license': 1,
'file_name': 'COCO_train2014_000000098304.jpg',
'coco_url': 'http://mscoco.org/images/98304',
'height': 424,
'width': 640,
'date_captured': '2013-11-21 23:06:41',
'flickr_url': 'http://farm6.staticflickr.com/5062/5896644212_a326e96ea9_z.jpg',
'id': 98304
}
refcoco_plus['images'][19992-1]
{'license': 6,
'file_name': 'COCO_train2014_000000458751.jpg',
'coco_url': 'http://mscoco.org/images/458751',
'height': 576,
'width': 592,
'date_captured': '2013-11-16 21:13:51',
'flickr_url': 'http://farm8.staticflickr.com/7018/6821165845_48ebd9590f_z.jpg',
'id': 458751
}
refcoco_plus['annotations'][0]
{'segmentation': [[267.52, 229.75, 265.6, 226.68, 265.79, 223.6, 263.87, 220.15, 263.87, 216.88, 266.94, 217.07, 268.48, 221.3, 272.32, 219.95, 276.35, 220.15, 279.62, 218.03, 283.46, 218.42, 285.0, 220.92, 285.0, 223.22, 284.42, 224.95, 280.96, 225.14, 279.81, 226.48, 281.73, 228.41, 279.43, 229.37, 275.78, 229.17, 273.86, 229.56, 274.24, 232.05, 269.82, 231.67, 267.14, 231.48, 266.75, 228.6]],
'area': 197.29899999999986,
'iscrowd': 0,
'image_id': 98304,
'bbox': [263.87, 216.88, 21.13, 15.17],
'category_id': 18,
'id': 3007
}
refcoco_plus['annotations'][196737-1]
{'segmentation': [[203.42, 96.23, 216.68, 104.44, 216.05, 114.54, 226.15, 118.96, 228.67, 132.21, 247.61, 138.52, 250.13, 156.83, 236.88, 159.35, 234.35, 167.56, 274.12, 168.19, 281.69, 185.87, 284.85, 213.01, 267.81, 237.62, 243.19, 236.36, 238.14, 223.74, 232.46, 232.57, 231.2, 284.33, 159.87, 283.07, 159.87, 218.06, 151.67, 206.7, 154.19, 190.92, 159.87, 184.6, 158.61, 166.3, 140.3, 153.04, 142.2, 144.84, 178.81, 147.99, 183.86, 142.94, 169.97, 125.9, 173.13, 114.54, 176.28, 113.91, 185.75, 96.87, 200.9, 94.97]],
'area': 16238.20485,
'iscrowd': 0,
'image_id': 458751,
'bbox': [140.3, 94.97, 144.55, 189.36],
'category_id': 11,
'id': 1808941
}
数据集构成和规范:
数据集 3(RefCOCOg)
#评测指标-1. 精度(Precision or Accuracy)
数据描述:
RefCOCOg是一个指代表达/视觉定位数据集,图像来自MS COCO网站的其中一部分,区域图像和对应短语(指代表达)来自人工标注。RefCOCOg是一个小型的视觉-语言多模态训练与测试基准数据集,可以用于评测指代表达/视觉定位等任务,图像内容多来自生活场景,短语(指代表达)描述通常是对于图像区域内容的直观描述。
源数据量:
数据集分成训练集(80512),验证集(4896),测试集(9602)
评测数据量:
评测数据为源数据测试集中的9602个区域图像和对应短语(指代表达)实例
源数据字段:
KEYS | EXPLAIN |
---|---|
'images' | |
id | 图像id |
file_name | 图像文件名 |
width | 图像宽度 |
height | 图像高度 |
coco_url | 图像在coco数据集的url |
flickr_url | 图像在flickr数据集的url |
license | 图像许可类型 |
'annotations' | |
id | 图像区域id |
image_id | 图像区域所属图像id |
category_id | 图像区域的分类id |
bbox | 图像区域的矩形包围框 |
segmentation | 图像区域的多边形边缘分割 |
area | 图像区域的分割面积 |
iscrowd | 图像区域分割是多目标群体 |
'references' | |
ref_id | 指代表达id |
ann_id | 指代表达所属图像区域id |
split | 指代表达的数据集划分类型 |
sent_ids | 多个指代表达的id列表 |
sentences | 多个指代表达的具体内容 |
源数据集样例:
import json as jsonmod refcoco_g = jsonmod.load(open('./refcocog/instances.json', 'r')) refcoco_g_p = pickle.load(open('./refcocog/refs(umd).p', 'rb'),fix_imports=True)
refcoco_g_p[0]
{'image_id': 380440,
'split': 'test',
'sentences': [
{'tokens': ['the', 'man', 'in', 'yellow', 'coat'], 'raw': 'the man in yellow coat', 'sent_id': 8, 'sent': 'the man in yellow coat'},
{'tokens': ['skiier', 'in', 'red', 'pants'], 'raw': 'Skiier in red pants.', 'sent_id': 9, 'sent': 'skiier in red pants'}
],
'file_name': 'COCO_train2014_000000380440_491042.jpg',
'category_id': 1,
'ann_id': 491042,
'sent_ids': [8, 9],
'ref_id': 0
}
refcoco_g_p[49822-1]
{'image_id': 573297,
'split': 'train',
'sentences': [
{'tokens': ['a', 'person', 'in', 'red', 'dress', 'and', 'he', 'is', 'seeing', 'his', 'mobile'], 'raw': 'A person in red dress and he is seeing his mobile.', 'sent_id': 104558, 'sent': 'a person in red dress and he is seeing his mobile'},
{'tokens': ['man', 'wearing', 'a', 'red', 'costume'], 'raw': 'Man wearing a red costume.', 'sent_id': 104559, 'sent': 'man wearing a red costume'}
],
'file_name': 'COCO_train2014_000000573297_472971.jpg',
'category_id': 1,
'ann_id': 472971,
'sent_ids': [104558, 104559],
'ref_id': 49821
}
refcoco_g['images'][0]
{'license': 1,
'file_name': 'COCO_train2014_000000131074.jpg',
'coco_url': 'http://mscoco.org/images/131074',
'height': 428,
'width': 640,
'date_captured': '2013-11-21 01:03:06',
'flickr_url': 'http://farm9.staticflickr.com/8308/7908210548_33e532d119_z.jpg',
'id': 131074
}
refcoco_g['images'][25799-1]
{'license': 5,
'file_name': 'COCO_train2014_000000524286.jpg',
'coco_url': 'http://mscoco.org/images/524286',
'height': 480,
'width': 640,
'date_captured': '2013-11-22 01:08:02',
'flickr_url': 'http://farm4.staticflickr.com/3286/3160643026_c2691d2c55_z.jpg',
'id': 524286
}
refcoco_g['annotations'][0]
{'segmentation': [[21.11, 239.09, 16.31, 274.6, 198.65, 349.45, 240.87, 336.98, 320.52, 293.79, 334.91, 248.69, 357.95, 273.64, 353.15, 289.0, 398.25, 267.88, 437.6, 251.57, 412.65, 228.54, 240.87, 210.31, 219.76, 141.21, 113.24, 153.69, 63.34, 156.57, 26.87, 169.04]],
'area': 48667.84089999999,
'iscrowd': 0,
'image_id': 131074,
'bbox': [16.31, 141.21, 421.29, 208.24],
'category_id': 65,
'id': 318235
}
refcoco_g['annotations'][208960-1]
{'segmentation': [[158.56, 212.49, 158.56, 94.92, 467.06, 85.21, 476.76, 209.26]],
'area': 37887.193,
'iscrowd': 0,
'image_id': 524286,
'bbox': [158.56, 85.21, 318.2, 127.28],
'category_id': 76,
'id': 1635174
}
数据集构成和规范:
论文引用:
{RefCOCO, RefCOCO+,
title={Modeling context in referring expressions},
author={Yu, Licheng and Poirson, Patrick and Yang, Shan and Berg, Alexander C and Berg, Tamara L},
booktitle={Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14},
pages={69--85},
year={2016},
organization={Springer}
}
{RefCOCOg,
title={Generation and comprehension of unambiguous object descriptions},
author={Mao, Junhua and Huang, Jonathan and Toshev, Alexander and Camburu, Oana and Yuille, Alan L and Murphy, Kevin},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={11--20},
year={2016}
}
源数据集版权使用说明:
[Datasets] 各种RefCOCO数据集MS COCO图像数据集RefCOCORefCOCO+RefCOCOg [Licenses] Attribution-NonCommercial-ShareAlike LicenseAttribution-NonCommercial LicenseAttribution-NonCommercial-NoDerivs LicenseAttribution LicenseAttribution-ShareAlike LicenseAttribution-NoDerivs LicenseNo known copyright restrictionsUnited States Government Work