## 标准梯度下降

$w_{t} = w_{t-1} - lr\triangledown f(w_{t-1})$

$$lr$$是固定值，表示步长，所以在假设梯度不变的情况下，每次更新沿着参数的负梯度方向前进固定步长

## 经典动量

$v_{t} = \mu v_{t-1} - lr\triangledown f(w_{t-1}) \\ w_{t} = w_{t-1} + v_{t}$

$$v$$表示速度，初始化为0

$$\mu$$表示动量因子（momentum coefficient），大小为$$[0,1]$$，表示当前权重变化受过去累计梯度的影响

### 加速倍率

$w_{1} = w_{0} - lr\triangledown f(w_{0})\\ w_{2} = w_{1} - lr\triangledown f(w_{1}) =w_{0} - lr\triangledown f(w_{0})- lr\triangledown f(w_{1})\\ w_{3} = w_{2} - lr\triangledown f(w_{2}) =w_{0} - lr\triangledown f(w_{0})- lr\triangledown f(w_{1})- lr\triangledown f(w_{2})\\ ...\\ \Rightarrow w_{t} = w_{0} - lr(\triangledown f(w_{0}) + \triangledown f(w_{1}) + ... + \triangledown f(w_{t-1}))$

$v_{1} = \mu v_{0} - lr\triangledown f(w_{0}) = - lr\triangledown f(w_{0})\\ w_{1} = w_{0} + v_{1}$

$v_{2} = \mu v_{1} - lr\triangledown f(w_{1})\\ w_{2} = w_{1} + v_{2} =w_{0} + v_{1} + v_{2}\\ =w_{0} + v_{1} + \mu v_{1} - lr\triangledown f(w_{1}) =w_{0} + (1+\mu)v_{1} - lr\triangledown f(w_{1})\\ =w_{0} - (1+\mu)lr\triangledown f(w_{0}) - lr\triangledown f(w_{1})$

$v_{3} = \mu v_{2} - lr\triangledown f(w_{2})\\ w_{3} = w_{2} + v_{3}\\ =w_{0} - (1 + \mu + \mu^{2})lr\triangledown f(w_{0}) - (1+\mu)lr\triangledown f(w_{1}) - lr\triangledown f(w_{2})$

$...$

$v_{t} = \mu v_{t-1} - lr\triangledown f(w_{t-1})\\ w_{t} = w_{t-1} + v_{t}\\ =w_{0} - (1 + \mu + \mu^{2} + ... + \mu^{t-1})lr\triangledown f(w_{0}) - (1 + \mu + \mu^{2} + ... + \mu^{t-2})lr\triangledown f(w_{1}) - ... - lr\triangledown f(w_{t-1})$

$$\triangledown f(w_{0})$$为例

$1 + \mu + \mu^{2} + ... + \mu^{t-1}$

$S_{n} = a_{1}\frac {1-q^{n}}{1-q} \Rightarrow S_{t} = \frac {1-\mu^{t-1}}{1-\mu}$

• $$\mu=0.5$$时，$$S_{t}=2$$
• $$\mu=0.9$$时，$$S_{t}=10$$
• $$\mu=0.99$$时，$$S_{t}=100$$

## numpy测试

$h = x^{2} + 50y^{2}$

## pytorch实现

torch.optim.SGD实现的动量更新公式有别于经典动量，使用的是重球法（heavy ball method，简称HBM

$v_{t} = \mu v_{t-1} + \triangledown f(w_{t-1})\\ w_{t} = w_{t-1} - lr v_{t}$