评测数据

以下数据集均转化为标准评测Prompt后再进行评测。

数据集 1（RefCOCO）

数据描述：

RefCOCO是一个指代表达/视觉定位数据集，图像来自MS COCO网站的其中一部分，区域图像和对应短语(指代表达)来自人工标注。RefCOCO是一个小型的视觉-语言多模态训练与测试基准数据集，可以用于评测指代表达/视觉定位等任务，图像内容多来自生活场景，短语(指代表达)描述通常是对于图像区域内容的直观描述。

源数据量：

数据集分成训练集(120624)，验证集(10834)，测试集A(5657)，测试集B(5095)

评测数据量:

评测数据为源数据测试集A中的5657个区域图像和对应短语(指代表达)实例，以及测试集B中的5095个区域图像和对应短语(指代表达)实例

源数据字段：

KEYS	EXPLAIN
'images'
id	图像id
file_name	图像文件名
width	图像宽度
height	图像高度
coco_url	图像在coco数据集的url
flickr_url	图像在flickr数据集的url
license	图像许可类型
'annotations'
id	图像区域id
image_id	图像区域所属图像id
category_id	图像区域的分类id
bbox	图像区域的矩形包围框
segmentation	图像区域的多边形边缘分割
area	图像区域的分割面积
iscrowd	图像区域分割是多目标群体
'references'
ref_id	指代表达id
ann_id	指代表达所属图像区域id
split	指代表达的数据集划分类型
sent_ids	多个指代表达的id列表
sentences	多个指代表达的具体内容

源数据集样例：

import json as jsonmod import pickle refcoco = jsonmod.load(open('./refcoco/instances.json', 'r')) refcoco_p = pickle.load(open('./refcoco/refs(unc).p', 'rb'),fix_imports=True)

refcoco_p[0]
{'sent_ids': [0, 1, 2], 
 'file_name': 'COCO_train2014_000000581857_16.jpg', 
 'ann_id': 1719310, 
 'ref_id': 0, 
 'image_id': 581857, 
 'split': 'train', 
 'sentences': 
    [{'tokens': ['the', 'lady', 'with', 'the', 'blue', 'shirt'], 'raw': 'THE LADY WITH THE BLUE SHIRT', 'sent_id': 0, 'sent': 'the lady with the blue shirt'}, 
     {'tokens': ['lady', 'with', 'back', 'to', 'us'], 'raw': 'lady w back to us', 'sent_id': 1, 'sent': 'lady with back to us'}, 
     {'tokens': ['blue', 'shirt'], 'raw': 'blue shirt', 'sent_id': 2, 'sent': 'blue shirt'}
    ], 
 'category_id': 1
}
refcoco_p[50000-1]
{'sent_ids': [142208, 142209], 
 'file_name': 'COCO_train2014_000000000072_0.jpg', 
 'ann_id': 598731, 
 'ref_id': 49999, 
 'image_id': 72, 
 'split': 'train', 
 'sentences': 
    [{'tokens': ['right', 'giraffe'], 'raw': 'RIGHT GIRAFFE', 'sent_id': 142208, 'sent': 'right giraffe'}, 
     {'tokens': ['right', 'girafe'], 'raw': 'right girafe', 'sent_id': 142209, 'sent': 'right girafe'}
    ], 
 'category_id': 25
}

refcoco['images'][0]
{'license': 1, 
 'file_name': 'COCO_train2014_000000098304.jpg', 
 'coco_url': 'http://mscoco.org/images/98304', 
 'height': 424, 
 'width': 640, 
 'date_captured': '2013-11-21 23:06:41', 
 'flickr_url': 'http://farm6.staticflickr.com/5062/5896644212_a326e96ea9_z.jpg', 
 'id': 98304
}
refcoco['images'][19994-1]
{'license': 6, 
 'file_name': 'COCO_train2014_000000458751.jpg', 
 'coco_url': 'http://mscoco.org/images/458751', 
 'height': 576, 
 'width': 592, 
 'date_captured': '2013-11-16 21:13:51', 
 'flickr_url': 'http://farm8.staticflickr.com/7018/6821165845_48ebd9590f_z.jpg', 
 'id': 458751
}

refcoco['annotations'][0]
{'segmentation': [[267.52, 229.75, 265.6, 226.68, 265.79, 223.6, 263.87, 220.15, 263.87, 216.88, 266.94, 217.07, 268.48, 221.3, 272.32, 219.95, 276.35, 220.15, 279.62, 218.03, 283.46, 218.42, 285.0, 220.92, 285.0, 223.22, 284.42, 224.95, 280.96, 225.14, 279.81, 226.48, 281.73, 228.41, 279.43, 229.37, 275.78, 229.17, 273.86, 229.56, 274.24, 232.05, 269.82, 231.67, 267.14, 231.48, 266.75, 228.6]], 
 'area': 197.29899999999986, 
 'iscrowd': 0, 
 'image_id': 98304, 
 'bbox': [263.87, 216.88, 21.13, 15.17], 
 'category_id': 18, 
 'id': 3007
}
refcoco['annotations'][196771-1]
{'segmentation': [[203.42, 96.23, 216.68, 104.44, 216.05, 114.54, 226.15, 118.96, 228.67, 132.21, 247.61, 138.52, 250.13, 156.83, 236.88, 159.35, 234.35, 167.56, 274.12, 168.19, 281.69, 185.87, 284.85, 213.01, 267.81, 237.62, 243.19, 236.36, 238.14, 223.74, 232.46, 232.57, 231.2, 284.33, 159.87, 283.07, 159.87, 218.06, 151.67, 206.7, 154.19, 190.92, 159.87, 184.6, 158.61, 166.3, 140.3, 153.04, 142.2, 144.84, 178.81, 147.99, 183.86, 142.94, 169.97, 125.9, 173.13, 114.54, 176.28, 113.91, 185.75, 96.87, 200.9, 94.97]], 
 'area': 16238.20485, 
 'iscrowd': 0, 
 'image_id': 458751, 
 'bbox': [140.3, 94.97, 144.55, 189.36], 
 'category_id': 11, 
 'id': 1808941
}

数据集构成和规范：

数据集 2（RefCOCO+）

#评测指标-1. 精度(Precision or Accuracy)

数据描述：

RefCOCO+是一个指代表达/视觉定位数据集，图像来自MS COCO网站的其中一部分，区域图像和对应短语(指代表达)来自人工标注。RefCOCO+是一个小型的视觉-语言多模态训练与测试基准数据集，可以用于评测指代表达/视觉定位等任务，图像内容多来自生活场景，短语(指代表达)描述通常是对于图像区域内容的直观描述。

源数据量：

数据集分成训练集(120191)，验证集(10758)，测试集A(5726)，测试集B(4889)

评测数据量:

评测数据为源数据测试集A中的5726个区域图像和对应短语(指代表达)实例，以及测试集B中的4889个区域图像和对应短语(指代表达)实例

源数据字段：

KEYS	EXPLAIN
'images'
id	图像id
file_name	图像文件名
width	图像宽度
height	图像高度
coco_url	图像在coco数据集的url
flickr_url	图像在flickr数据集的url
license	图像许可类型
'annotations'
id	图像区域id
image_id	图像区域所属图像id
category_id	图像区域的分类id
bbox	图像区域的矩形包围框
segmentation	图像区域的多边形边缘分割
area	图像区域的分割面积
iscrowd	图像区域分割是多目标群体
'references'
ref_id	指代表达id
ann_id	指代表达所属图像区域id
split	指代表达的数据集划分类型
sent_ids	多个指代表达的id列表
sentences	多个指代表达的具体内容

源数据集样例：

import json as jsonmod refcoco_plus = jsonmod.load(open('./refcoco+/instances.json', 'r')) refcoco_plus_p = pickle.load(open('./refcoco+/refs(unc).p', 'rb'),fix_imports=True)

refcoco_plus_p[0]
{'sent_ids': [0, 1, 2], 
 'file_name': 'COCO_train2014_000000581857_16.jpg', 
 'ann_id': 1719310, 
 'ref_id': 0, 
 'image_id': 581857, 
 'split': 'train', 
 'sentences': [
    {'tokens': ['navy', 'blue', 'shirt'], 'raw': 'navy blue shirt', 'sent_id': 0, 'sent': 'navy blue shirt'}, 
    {'tokens': ['woman', 'back', 'in', 'blue'], 'raw': 'woman back in blue', 'sent_id': 1, 'sent': 'woman back in blue'}, 
    {'tokens': ['blue', 'shirt'], 'raw': 'blue shirt', 'sent_id': 2, 'sent': 'blue shirt'}
 ], 
 'category_id': 1
}
refcoco_plus_p[49856-1]
{'sent_ids': [141560, 141561, 141562, 141563], 
 'file_name': 'COCO_train2014_000000000072_0.jpg', 
 'ann_id': 598731, 
 'ref_id': 49855, 
 'image_id': 72, 
 'split': 'train', 
 'sentences': [
    {'tokens': ['shorter', 'giraffe'], 'raw': 'shorter giraffe', 'sent_id': 141560, 'sent': 'shorter giraffe'}, 
    {'tokens': ['giraffe', 'closest', 'to', 'camera'], 'raw': 'giraffe closest to camera', 'sent_id': 141561, 'sent': 'giraffe closest to camera'}, 
    {'tokens': ['bent', 'neck'], 'raw': 'bent neck', 'sent_id': 141562, 'sent': 'bent neck'}, 
    {'tokens': ['shorter', 'animal'], 'raw': 'shorter animal', 'sent_id': 141563, 'sent': 'shorter animal'}
 ], 
 'category_id': 25
}

refcoco_plus['images'][0]
{'license': 1, 
 'file_name': 'COCO_train2014_000000098304.jpg', 
 'coco_url': 'http://mscoco.org/images/98304', 
 'height': 424, 
 'width': 640, 
 'date_captured': '2013-11-21 23:06:41', 
 'flickr_url': 'http://farm6.staticflickr.com/5062/5896644212_a326e96ea9_z.jpg', 
 'id': 98304
}
refcoco_plus['images'][19992-1]
{'license': 6, 
 'file_name': 'COCO_train2014_000000458751.jpg', 
 'coco_url': 'http://mscoco.org/images/458751', 
 'height': 576, 
 'width': 592, 
 'date_captured': '2013-11-16 21:13:51', 
 'flickr_url': 'http://farm8.staticflickr.com/7018/6821165845_48ebd9590f_z.jpg', 
 'id': 458751
}

refcoco_plus['annotations'][0]
{'segmentation': [[267.52, 229.75, 265.6, 226.68, 265.79, 223.6, 263.87, 220.15, 263.87, 216.88, 266.94, 217.07, 268.48, 221.3, 272.32, 219.95, 276.35, 220.15, 279.62, 218.03, 283.46, 218.42, 285.0, 220.92, 285.0, 223.22, 284.42, 224.95, 280.96, 225.14, 279.81, 226.48, 281.73, 228.41, 279.43, 229.37, 275.78, 229.17, 273.86, 229.56, 274.24, 232.05, 269.82, 231.67, 267.14, 231.48, 266.75, 228.6]], 
 'area': 197.29899999999986, 
 'iscrowd': 0, 
 'image_id': 98304, 
 'bbox': [263.87, 216.88, 21.13, 15.17], 
 'category_id': 18, 
 'id': 3007
}
refcoco_plus['annotations'][196737-1]
{'segmentation': [[203.42, 96.23, 216.68, 104.44, 216.05, 114.54, 226.15, 118.96, 228.67, 132.21, 247.61, 138.52, 250.13, 156.83, 236.88, 159.35, 234.35, 167.56, 274.12, 168.19, 281.69, 185.87, 284.85, 213.01, 267.81, 237.62, 243.19, 236.36, 238.14, 223.74, 232.46, 232.57, 231.2, 284.33, 159.87, 283.07, 159.87, 218.06, 151.67, 206.7, 154.19, 190.92, 159.87, 184.6, 158.61, 166.3, 140.3, 153.04, 142.2, 144.84, 178.81, 147.99, 183.86, 142.94, 169.97, 125.9, 173.13, 114.54, 176.28, 113.91, 185.75, 96.87, 200.9, 94.97]], 
 'area': 16238.20485, 
 'iscrowd': 0, 
 'image_id': 458751, 
 'bbox': [140.3, 94.97, 144.55, 189.36], 
 'category_id': 11, 
 'id': 1808941
}

数据集构成和规范：

数据集 3（RefCOCOg）

#评测指标-1. 精度(Precision or Accuracy)

数据描述：

RefCOCOg是一个指代表达/视觉定位数据集，图像来自MS COCO网站的其中一部分，区域图像和对应短语(指代表达)来自人工标注。RefCOCOg是一个小型的视觉-语言多模态训练与测试基准数据集，可以用于评测指代表达/视觉定位等任务，图像内容多来自生活场景，短语(指代表达)描述通常是对于图像区域内容的直观描述。

源数据量：

数据集分成训练集(80512)，验证集(4896)，测试集(9602)

评测数据量:

评测数据为源数据测试集中的9602个区域图像和对应短语(指代表达)实例

源数据字段：

KEYS	EXPLAIN
'images'
id	图像id
file_name	图像文件名
width	图像宽度
height	图像高度
coco_url	图像在coco数据集的url
flickr_url	图像在flickr数据集的url
license	图像许可类型
'annotations'
id	图像区域id
image_id	图像区域所属图像id
category_id	图像区域的分类id
bbox	图像区域的矩形包围框
segmentation	图像区域的多边形边缘分割
area	图像区域的分割面积
iscrowd	图像区域分割是多目标群体
'references'
ref_id	指代表达id
ann_id	指代表达所属图像区域id
split	指代表达的数据集划分类型
sent_ids	多个指代表达的id列表
sentences	多个指代表达的具体内容

源数据集样例：

import json as jsonmod refcoco_g = jsonmod.load(open('./refcocog/instances.json', 'r')) refcoco_g_p = pickle.load(open('./refcocog/refs(umd).p', 'rb'),fix_imports=True)

refcoco_g_p[0]
{'image_id': 380440, 
 'split': 'test', 
 'sentences': [
    {'tokens': ['the', 'man', 'in', 'yellow', 'coat'], 'raw': 'the man in yellow coat', 'sent_id': 8, 'sent': 'the man in yellow coat'}, 
    {'tokens': ['skiier', 'in', 'red', 'pants'], 'raw': 'Skiier in red pants.', 'sent_id': 9, 'sent': 'skiier in red pants'}
 ], 
 'file_name': 'COCO_train2014_000000380440_491042.jpg', 
 'category_id': 1, 
 'ann_id': 491042, 
 'sent_ids': [8, 9], 
 'ref_id': 0
}
refcoco_g_p[49822-1]
{'image_id': 573297, 
 'split': 'train', 
 'sentences': [
    {'tokens': ['a', 'person', 'in', 'red', 'dress', 'and', 'he', 'is', 'seeing', 'his', 'mobile'], 'raw': 'A person in red dress and he is seeing his mobile.', 'sent_id': 104558, 'sent': 'a person in red dress and he is seeing his mobile'}, 
    {'tokens': ['man', 'wearing', 'a', 'red', 'costume'], 'raw': 'Man wearing a red costume.', 'sent_id': 104559, 'sent': 'man wearing a red costume'}
 ], 
 'file_name': 'COCO_train2014_000000573297_472971.jpg', 
 'category_id': 1, 
 'ann_id': 472971, 
 'sent_ids': [104558, 104559], 
 'ref_id': 49821
}

refcoco_g['images'][0]
{'license': 1, 
 'file_name': 'COCO_train2014_000000131074.jpg', 
 'coco_url': 'http://mscoco.org/images/131074', 
 'height': 428, 
 'width': 640, 
 'date_captured': '2013-11-21 01:03:06', 
 'flickr_url': 'http://farm9.staticflickr.com/8308/7908210548_33e532d119_z.jpg', 
 'id': 131074
}
refcoco_g['images'][25799-1]
{'license': 5, 
 'file_name': 'COCO_train2014_000000524286.jpg', 
 'coco_url': 'http://mscoco.org/images/524286', 
 'height': 480, 
 'width': 640, 
 'date_captured': '2013-11-22 01:08:02', 
 'flickr_url': 'http://farm4.staticflickr.com/3286/3160643026_c2691d2c55_z.jpg', 
 'id': 524286
}

refcoco_g['annotations'][0]
{'segmentation': [[21.11, 239.09, 16.31, 274.6, 198.65, 349.45, 240.87, 336.98, 320.52, 293.79, 334.91, 248.69, 357.95, 273.64, 353.15, 289.0, 398.25, 267.88, 437.6, 251.57, 412.65, 228.54, 240.87, 210.31, 219.76, 141.21, 113.24, 153.69, 63.34, 156.57, 26.87, 169.04]], 
 'area': 48667.84089999999, 
 'iscrowd': 0, 
 'image_id': 131074, 
 'bbox': [16.31, 141.21, 421.29, 208.24], 
 'category_id': 65, 
 'id': 318235
}
refcoco_g['annotations'][208960-1]
{'segmentation': [[158.56, 212.49, 158.56, 94.92, 467.06, 85.21, 476.76, 209.26]], 
 'area': 37887.193, 
 'iscrowd': 0, 
 'image_id': 524286, 
 'bbox': [158.56, 85.21, 318.2, 127.28], 
 'category_id': 76, 
 'id': 1635174
}

数据集构成和规范：

论文引用：

{RefCOCO, RefCOCO+,
  title={Modeling context in referring expressions},
  author={Yu, Licheng and Poirson, Patrick and Yang, Shan and Berg, Alexander C and Berg, Tamara L},
  booktitle={Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14},
  pages={69--85},
  year={2016},
  organization={Springer}
}
{RefCOCOg,
  title={Generation and comprehension of unambiguous object descriptions},
  author={Mao, Junhua and Huang, Jonathan and Toshev, Alexander and Camburu, Oana and Yuille, Alan L and Murphy, Kevin},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={11--20},
  year={2016}
}

源数据集版权使用说明：

[Datasets] 各种RefCOCO数据集 MS COCO图像数据集 RefCOCO RefCOCO+RefCOCOg [Licenses] Attribution-NonCommercial-ShareAlike License Attribution-NonCommercial License Attribution-NonCommercial-NoDerivs License Attribution License Attribution-ShareAlike License Attribution-NoDerivs License No known copyright restrictions United States Government Work

评测数据 ​

数据集 1（RefCOCO） ​

数据描述： ​

源数据量： ​

评测数据量: ​

源数据字段： ​

源数据集样例： ​

数据集构成和规范： ​

数据集 2（RefCOCO+） ​

数据描述： ​

源数据量： ​

评测数据量: ​

源数据字段： ​

源数据集样例： ​

数据集构成和规范： ​

数据集 3（RefCOCOg） ​

数据描述： ​

源数据量： ​

评测数据量: ​

源数据字段： ​

源数据集样例： ​

数据集构成和规范： ​

论文引用： ​

源数据集版权使用说明： ​

评测数据

数据集 1（RefCOCO）

数据描述：

源数据量：

评测数据量:

源数据字段：

源数据集样例：

数据集构成和规范：

数据集 2（RefCOCO+）

数据描述：

源数据量：

评测数据量:

源数据字段：

源数据集样例：

数据集构成和规范：

数据集 3（RefCOCOg）

数据描述：

源数据量：

评测数据量:

源数据字段：

源数据集样例：

数据集构成和规范：

论文引用：

源数据集版权使用说明：