[数据集]COCO简介

介绍COCO数据集以及标注文件格式。官网地址:COCO

简介

COCO是一个大规模的目标检测、分割和字幕数据集。包含了以下特征:

  1. 目标分割(Object segmentation
  2. 场景分析(Recognition in context
  3. 超像素分割(Superpixel stuff segmentation
  4. 33万张图像,其中超过20万张已标注(330K images (>200K labeled)
  5. 150万个目标实例(1.5 million object instances
  6. 80个目标类别(80 object categories
  7. 91 stuff categories
  8. 每张图片5个字幕(5 captions per image
  9. 25万个行人关键点(250,000 people with keypoints

论文

文章Microsoft COCO: Common Objects in Context详细描述了COCO数据集

Abstract—We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

摘要 - 我们提出了一个新的数据集,目的是通过将目标识别问题放在更广泛的场景理解问题的背景下,来提高目标识别的技术水平。这是通过收集复杂的日常场景的图像来实现的,这些场景包含自然环境中的常见对象。使用每个实例分割来标记对象以实现精确的目标定位。数据集共包含91种目标类型的图片,4岁的孩子可以很容易进行辨认。在328k张图片中,总共有250万个标记实例,数据集的创建通过新的用户界面吸引了大量人群的参与,包括类别检测、实例识别和实例分割。我们对数据集进行了详细的统计分析,并与PASCAL、ImageNet和SUN进行了比较。最后,利用DPM(Deformable Parts Model),对边界框和分割检测结果进行基线性能分析

数据集

参考:Test Guidelines

COCO在每年都会发布一个新的挑战赛,使用不同的训练/验证/测试集。如下图所示:

注意:2017 train/val数据集包含的图像和2014 train/val数据集相同,只是组织方式变化了

数据格式

注意:对于下面不同节点中使用的下标属性(id/image_id/category_id)均从1开始

输入数据格式

参考:Data format

COCO共包含了5种标注类型:

这些不同任务的标注共享同一个基本数据结构,然后各自对部分字段有额外的补充。下面先介绍基本标注格式,然后介绍目标检测任务的标注

基本数据结构

包含了4个根节点:infoimagesannotationslicenses

1
2
3
4
5
6
{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
}

其中对infoimageslicenses有相同的格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}

image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}

license{
"id": int,
"name": str,
"url": str,
}

对于annotations字段,不同任务有不同的格式

目标检测数据结构

目标检测任务包含了边界框检测和实例分割检测,其添加的字段如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
annotation{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}

categories[{
"id": int,
"name": str,
"supercategory": str,
}]

边界框检测相关字段如下:

  • id:标注编号
  • image_id:图像编号
  • category_id:图像类别编号,用于查询字段categories
  • bbox:边界框坐标,[center_x, center_y, w, h]

结果数据格式

参考:Results Format

对于目标检测(边界框检测),其结果输出按以下格式进行:

1
2
3
4
5
6
7
8
[
{
"image_id": int,
"category_id": int,
"bbox": [x,y,width,height],
"score": float,
}
]

注意:x/y表示边界框左上角坐标,整个边界框输出以float类型表示

附录

示例一:输入数据格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
{
"info": {
"description": "COCO 2017 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"
},
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
},
{
"url": "http://creativecommons.org/licenses/by-nc/2.0/",
"id": 2,
"name": "Attribution-NonCommercial License"
},
],
"images": [
{
"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
"id": 397133
},
{
"license": 1,
"file_name": "000000037777.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
"height": 230,
"width": 352,
"date_captured": "2013-11-14 20:55:31",
"flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
"id": 37777
}
],
"annotations": [
{
"segmentation": [
[
510.66,
423.01,
511.72,
。。。
。。。
]
],
"area": 702.1057499999998,
"iscrowd": 0,
"image_id": 289343,
"bbox": [
473.07,
395.93,
38.65,
28.67
],
"category_id": 18,
"id": 1768
},
{
"segmentation": [
[
289.74,
443.39,
302.29,
。。。
。。。
]
],
"area": 27718.476299999995,
"iscrowd": 0,
"image_id": 61471,
"bbox": [
272.1,
200.23,
151.97,
279.77
],
"category_id": 18,
"id": 1773
}
],
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person"
},
{
"supercategory": "vehicle",
"id": 2,
"name": "bicycle"
},
{
"supercategory": "vehicle",
"id": 3,
"name": "car"
}
]
}

示例二:输出数据格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[
{
"image_id": 42,
"category_id": 18,
"bbox": [
258.15,
41.29,
348.26,
243.78
],
"score": 0.236
},
{
"image_id": 74,
"category_id": 18,
"bbox": [
87.87,
276.25,
296.42,
103.18
],
"score": 0.546
}
]
坚持原创技术分享,您的支持将鼓励我继续创作!