## 摘要

Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.

## 引言

1. 利用SfM(Structure-from-Motion)清洗数据，并在CNN训练过程中部署困难正样本和困难负样本进行学习；
2. 传统的白化处理方式在某些情况下并不稳定，论文提出在相同训练数据上进行白化参数学习，它能够和微调训练互补，同步提升性能；
3. Oxford5kParis6k数据集上能够匹配最先进的检索性能（短向量）。

## 暹罗学习

$L(i,j)=\frac{1}{2}(Y(i,j)\left\| \bar{f}(i) - \bar{f}(j) \right\|^{2} + (1-Y(i,j)) (max\{0, \tau - \left\| \bar{f}(i) - \bar{f}(j) \right\|\})^{2})$

• $$Y(i,j)\in \{0, 1\}$$，声明了训练对之间是否属于同一标签（$$0$$表示不同，$$1$$表示相同）；
• $$\bar{f}(i)$$表示图像$$i$$L2归一化后的MAC特征向量；
• $$\tau$$是一个超参数，定义了当查询图像和困难负样本之间存在足够大的距离时不参与训练。

## 可学习白化

$C_{S}=\sum_{Y(i,j)=1}(\bar{f}(i)-\bar{f}(j))(\bar{f}(i)-\bar{f}(j))^{T}$

$C_{D}=\sum_{Y(i,j)=0}(\bar{f}(i)-\bar{f}(j))(\bar{f}(i)-\bar{f}(j))^{T}$

