[YOLO][k-means聚类]寻找锚点

YOLOv2YOLO的基础上进行了很多的调整,其中一个就是设置锚点框。相对于Faster R-CNN/SSD通过人工方式设置锚点的尺度和长宽比,YOLOv2提出了通过k-means聚类的方式自动计算出锚点的大小

k-means聚类

k-means聚类算法通过计算距离作为相似度来衡量两个对象是否属于同一簇

锚点计算

数据

  • 输入数据:训练数据集中各个GT
  • 输出数据:锚点框

改进

在原始的k-means聚类算法中,通过欧式距离来进行判断。在锚点计算过程中,这会导致更大的边界框会产生更多的误差,所以YOLOv2选择了计算聚类中心和对象的IoU作为评判标准

\[ d(box, centroid) = 1- IOU(box, centroid) \]

属性

锚点边界框由中心点坐标和长宽组成:\([center_{x}, center_{y}, w, h]\)。在训练过程中,中心点坐标已经固定(各个栅格中心),所以仅需计算长/宽即可

实现

仓库lars76/kmeans-anchor-boxesYOLOv2k-means锚点计算过程进行了很好的复现。关键代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
#############################  kmeans.py
import numpy as np


def iou(box, clusters):
"""
Calculates the Intersection over Union (IoU) between a box and k clusters.
:param box: tuple or array, shifted to the origin (i. e. width and height)
:param clusters: numpy array of shape (k, 2) where k is the number of clusters
:return: numpy array of shape (k, 0) where k is the number of clusters
"""
x = np.minimum(clusters[:, 0], box[0])
y = np.minimum(clusters[:, 1], box[1])
if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
raise ValueError("Box has no area")

intersection = x * y
box_area = box[0] * box[1]
cluster_area = clusters[:, 0] * clusters[:, 1]

iou_ = intersection / (box_area + cluster_area - intersection)

return iou_


def avg_iou(boxes, clusters):
"""
Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
:param boxes: numpy array of shape (r, 2), where r is the number of rows
:param clusters: numpy array of shape (k, 2) where k is the number of clusters
:return: average IoU as a single float
"""
return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])


def translate_boxes(boxes):
"""
Translates all the boxes to the origin.
:param boxes: numpy array of shape (r, 4)
:return: numpy array of shape (r, 2)
"""
new_boxes = boxes.copy()
for row in range(new_boxes.shape[0]):
new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
return np.delete(new_boxes, [0, 1], axis=1)


def kmeans(boxes, k, dist=np.median):
"""
Calculates k-means clustering with the Intersection over Union (IoU) metric.
:param boxes: numpy array of shape (r, 2), where r is the number of rows
:param k: number of clusters
:param dist: distance function
:return: numpy array of shape (k, 2)
"""
rows = boxes.shape[0]

distances = np.empty((rows, k))
last_clusters = np.zeros((rows,))

np.random.seed()

# the Forgy method will fail if the whole array contains the same rows
clusters = boxes[np.random.choice(rows, k, replace=False)]

while True:
for row in range(rows):
distances[row] = 1 - iou(boxes[row], clusters)

nearest_clusters = np.argmin(distances, axis=1)

if (last_clusters == nearest_clusters).all():
break

for cluster in range(k):
clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

last_clusters = nearest_clusters

return clusters
########################### examples.py
import glob
import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou

ANNOTATIONS_PATH = "Annotations"
CLUSTERS = 5

def load_dataset(path):
dataset = []
for xml_file in glob.glob("{}/*xml".format(path)):
tree = ET.parse(xml_file)

height = int(tree.findtext("./size/height"))
width = int(tree.findtext("./size/width"))

for obj in tree.iter("object"):
xmin = int(obj.findtext("bndbox/xmin")) / width
ymin = int(obj.findtext("bndbox/ymin")) / height
xmax = int(obj.findtext("bndbox/xmax")) / width
ymax = int(obj.findtext("bndbox/ymax")) / height

dataset.append([xmax - xmin, ymax - ymin])

return np.array(dataset)


data = load_dataset(ANNOTATIONS_PATH)
out = kmeans(data, k=CLUSTERS)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
print("Boxes:\n {}".format(out))

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))

整体代码还是比较容易理解的。首先提取训练数据集的标注框长宽坐标(比例形式);然后进行k-means聚类计算,每一轮都重新计算所有标注框和聚类中心的IoU;最后输出计算得到的\(k\)个聚类中心,也就是锚点长宽(比例形式

使用了VOC 2007进行了测试,其实现结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
############# k = 5
Accuracy: 60.42%
Boxes:
[[0.1 0.16266667]
[0.044 0.07733333]
[0.184 0.33866667]
[0.796 0.77333333]
[0.38 0.528 ]]
Ratios:
[0.54, 0.57, 0.61, 0.72, 1.03]
############# k = 9
Accuracy: 67.29%
Boxes:
[[0.4 0.70570571]
[0.51 0.37 ]
[0.102 0.10149254]
[0.06 0.18318318]
[0.114 0.31466667]
[0.228 0.46246246]
[0.824 0.7957958 ]
[0.038 0.068 ]
[0.2 0.19466667]]
Ratios:
[0.33, 0.36, 0.49, 0.56, 0.57, 1.0, 1.03, 1.04, 1.38]

Average IoU和论文结果一致:

相关阅读