pytorchautograd包中，利用Jacobian（雅格比）矩阵进行梯度的计算。学习实值标量函数、实值向量函数和实值矩阵函数相对于实向量变元或矩阵变元的偏导

## 计算符号

• 实向量变元：$$x=[x_{1},...,x_{m}]^T\in R^{m}$$
• 实矩阵变元：$$X=[x_{1},...,x_{n}]\in R^{m\times n}$$
• 实值标量函数
• $$f(X)\in R$$，其变元是$$m\times 1$$实值向量$$x$$，记作$$f:R^{m}\rightarrow R$$
• $$f(X)\in R$$，其变元是$$m\times n$$实矩阵$$X$$，记作$$f:R^{m\times n}\rightarrow R$$
• $$p$$维实列向量函数
• $$f(x)\in R^{p}$$，其变元是$$m\times 1$$实值向量$$x$$，记作$$f:R^{m}\rightarrow R^{p}$$
• $$f(X)\in R^{p}$$，其变元是$$m\times n$$实矩阵$$X$$，记作$$f:R^{m}\rightarrow R^{p}$$
• $$p\times q$$维实矩阵函数
• $$f(x)\in R^{p\times q}$$，其变元是$$m\times 1$$实值向量$$x$$，记作$$f:R^{m}\rightarrow R^{p\times q}$$
• $$f(X)\in R^{p\times q}$$，其变元是$$m\times n$$实矩阵$$X$$，记作$$f:R^{m}\rightarrow R^{p\times q}$$

## 行向量偏导算子和Jacobian矩阵

### 实值标量函数

$D_{x}=\frac {\partial }{\partial x^T} =[\frac {\partial }{\partial x_{1}},...,\frac {\partial }{\partial x_{m}}]$

$D_{x}f(x)=\frac {\partial f(x)}{\partial x^T} =[\frac {\partial f(x)}{\partial x_{1}},...,\frac {\partial f(x)}{\partial x_{m}}]$

$D_{X}f(X)=\frac {\partial f(X)}{\partial X^T}= \begin{bmatrix} \frac {\partial f(X)}{\partial x_{11}} & \dots & \frac {\partial f(X)}{\partial x_{m1}}\\ \vdots & \vdots & \vdots\\ \frac {\partial f(X)}{\partial x_{1n}} & \vdots & \frac {\partial f(X)}{\partial x_{mn}} \end{bmatrix} \in R^{n\times m}$

$D_{vecX}f(X)=[\frac {\partial f(X)}{\partial x_{11}},...,\frac {\partial f(X)}{\partial x_{m1}},...,\frac {\partial f(X)}{\partial x_{1n}},...,\frac {\partial f(X)}{\partial x_{mn}}]$

$$D_{X}f(X)$$称为实值标量函数$$f(X)$$关于矩阵变元$$X$$$$Jacobian$$矩阵

$$D_{vecX}f(X)$$称为实值标量函数$$f(X)$$关于矩阵变元$$X$$行偏导向量

$D_{vecX}f(X)=rvec(D_{X}f(X))=(vec(D_{X}^{T}f(X)))^T$

### 实值矩阵函数

$vec(F(X))= [f_{11}(X),...,f_{p1}(X),...,f_{1q}(X),...,f_{pq}(X)]^T\in R^{pq}$

$D_{X}F(X)=\frac {\partial vec(F(X))}{\partial (vecX)^T}\in R^{pq\times mn}$

$D_{X}F(X)= \begin{bmatrix} \frac {\partial f_{11}}{\partial (vecX)^T}\\ \vdots\\ \frac {\partial f_{p1}}{\partial (vecX)^T}\\ \vdots\\ \frac {\partial f_{1q}}{\partial (vecX)^T}\\ \vdots\\ \frac {\partial f_{pq}}{\partial (vecX)^T} \end{bmatrix}= \begin{bmatrix} \frac {\partial f_{11}}{\partial x_{11}} & \dots && \frac {\partial f_{11}}{\partial x_{m1}} & \dots & \frac {\partial f_{11}}{\partial x_{1n}} & \dots & \frac {\partial f_{11}}{\partial x_{mn}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{p1}}{\partial x_{11}} & \dots && \frac {\partial f_{p1}}{\partial x_{m1}} & \dots & \frac {\partial f_{p1}}{\partial x_{1n}} & \dots & \frac {\partial f_{p1}}{\partial x_{mn}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{1q}}{\partial x_{11}} & \dots && \frac {\partial f_{1q}}{\partial x_{m1}} & \dots & \frac {\partial f_{1q}}{\partial x_{1n}} & \dots & \frac {\partial f_{1q}}{\partial x_{mn}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{pq}}{\partial x_{11}} & \dots && \frac {\partial f_{pq}}{\partial x_{m1}} & \dots & \frac {\partial f_{pq}}{\partial x_{1n}} & \dots & \frac {\partial f_{pq}}{\partial x_{mn}}\\ \end{bmatrix}$

## 列向量偏导算子和梯度矩阵

### 实值标量函数

$\bigtriangledown_{x}=\frac {\partial }{\partial x^T} =[\frac {\partial }{\partial x_{1}},...,\frac {\partial }{\partial x_{m}}]^T$

$D_{x}f(x)=\frac {\partial f(x)}{\partial x} =[\frac {\partial f(x)}{\partial x_{1}},...,\frac {\partial f(x)}{\partial x_{m}}]^T$

$\bigtriangledown_{vecX}f(X)=\frac {\partial f(X)}{\partial vecX} =[\frac {\partial f(X)}{\partial x_{11}},...,\frac {\partial f(X)}{\partial x_{m1}},...,\frac {\partial f(X)}{\partial x_{1n}},...,\frac {\partial f(X)}{\partial x_{mn}}]^T$

$\bigtriangledown_{X}f(X)=\frac {\partial f(X)}{\partial X}= \begin{bmatrix} \frac {\partial f(X)}{\partial x_{11}} & \dots & \frac {\partial f(X)}{\partial x_{1n}}\\ \vdots & \vdots & \vdots\\ \frac {\partial f(X)}{\partial x_{m1}} & \vdots & \frac {\partial f(X)}{\partial x_{mn}} \end{bmatrix}$

$\bigtriangledown_{X}f(X)=D_{X}^T f(X)$

### 实值矩阵函数

$vec(F(X))= [f_{11}(X),...,f_{p1}(X),...,f_{1q}(X),...,f_{pq}(X)]^T\in R^{pq}$

$\bigtriangledown_{X}F(X)= \begin{bmatrix} \frac {\partial f_{11}}{\partial vecX}\\ \vdots\\ \frac {\partial f_{p1}}{\partial vecX}\\ \vdots\\ \frac {\partial f_{1q}}{\partial vecX}\\ \vdots\\ \frac {\partial f_{pq}}{\partial vecX} \end{bmatrix}= \begin{bmatrix} \frac {\partial f_{11}}{\partial x_{11}} & \dots && \frac {\partial f_{11}}{\partial x_{11}} & \dots & \frac {\partial f_{11}}{\partial x_{11}} & \dots & \frac {\partial f_{11}}{\partial x_{11}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{p1}}{\partial x_{m1}} & \dots && \frac {\partial f_{p1}}{\partial x_{m1}} & \dots & \frac {\partial f_{p1}}{\partial x_{m1}} & \dots & \frac {\partial f_{p1}}{\partial x_{m1}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{1q}}{\partial x_{1n}} & \dots && \frac {\partial f_{1q}}{\partial x_{1n}} & \dots & \frac {\partial f_{1q}}{\partial x_{1n}} & \dots & \frac {\partial f_{1q}}{\partial x_{1n}}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ \frac {\partial f_{pq}}{\partial x_{mn}} & \dots && \frac {\partial f_{pq}}{\partial x_{mn}} & \dots & \frac {\partial f_{pq}}{\partial x_{mn}} & \dots & \frac {\partial f_{pq}}{\partial x_{mn}}\\ \end{bmatrix}$

$\bigtriangledown_{X}F(X)=(D_{X} F(X))^T$

## 偏导和梯度计算

1. $$f(X)=c$$为常数，其中$$X\in R^{m\times n}$$，则梯度$$\frac {\partial c}{\partial X}=O_{m\times n}$$维数相容原则
2. 线性法则。若$$f(X)$$$$g(X)$$分别是矩阵$$X$$的实值函数，$$c_{1}$$$$c_{2}$$为实常数，那么

$\frac {\partial [c_{1}f(X)+c_{2}g(X)]}{\partial X} =c_{1}\frac {\partial f(X)}{\partial X} +c_{2}\frac {\partial g(X)}{\partial X}$

1. 乘积法则。若$$f(X), g(X)$$$$h(X)$$都是矩阵$$X$$的实值函数，则

$\frac {\partial [f(X)g(X)]}{\partial X} =g(X)\frac {\partial f(X)}{\partial X} +f(X)\frac {\partial g(X)}{\partial X}$

$\frac {\partial [f(X)g(X)h(X)]}{\partial X} =g(X)h(X)\frac {\partial f(X)}{\partial X} +f(X)h(X)\frac {\partial g(X)}{\partial X} +f(X)g(X)\frac {\partial h(X)}{\partial X}$

1. 商法则。若$$g(X)\neq 0$$，则

$\frac {\partial [f(X)/g(X)]}{\partial X} =\frac {1}{g(X)^2}[g(X)\frac {\partial f(X)}{\partial X}-f(X)\frac {\partial g(X)}{\partial X}]$

1. 链式法则。令$$X$$$$m\times n$$矩阵，且$$y=f(X)$$$$g(y)$$分别是以矩阵$$X$$和标量$$y$$为变元的实值函数，则

$\frac {\partial g(f(X))}{\partial X} =\frac {dg(y)}{dy} \frac {\partial f(X)}{\partial X}$

### 实值标量函数

1. 实值函数$$f(x)=x^{T}Ax$$的行偏导向量为$$Df(x)=x^{T}(A+A^{T})$$，梯度向量为$$\bigtriangledown_{X}f(x)=(Df(X))^{T}=(A^{T}+A)x$$
2. 实值函数$$f(x)=a^{T}XX^{T}b$$，其中$$X\in R^{m\times n},a,b\in R^{n\times 1}$$$$Jacobian$$矩阵为$$D_{X}f(X)=X^{T}(ba^{T}+ab^{T})$$，梯度矩阵为$$\bigtriangledown_{X}f(x)=(ab^{T}+ba^{T})X$$
3. 实值函数$$f(X)=tr(XB)$$，其中$$X\in R^{m\times n}, b\in R^{n\times m}, tr(BX)=tr(XB)$$，所以$$Jacobian$$矩阵为$$D_{X}tr(XB)=D_{X}tr(BX)=B$$，梯度矩阵为$$\bigtriangledown_{X}tr(XB)=\bigtriangledown_{X}tr(BX)=B^{T}$$

$x = \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} \ A=\begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix}$

$f(x)=x^{T}Ax= \begin{bmatrix} x_{1} & x_{2} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} =\sum_{k=1}^{2}\sum_{l=1}^{2}a_{kl}x_{k}x_{l}$

$=[x_{1}a_{11}+x_{2}a_{21}, x_{1}a_{12}+x_{2}a_{22}] \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} =x_{1}a_{11}x_{1}+x_{2}a_{21}x_{1}+x_{1}a_{12}x_{2}+x_{2}a_{22}x_{2}$

$Df(X)=\frac {\partial f(x)}{\partial x}= [x_{1}a_{11}+a_{11}x_{1}+x_{2}a_{21}+a_{12}x_{2}, a_{21}x_{1}+x_{1}a_{12}+x_{2}a_{22}+a_{22}x_{2}]=\\ [x_{1}a_{11}+x_{2}a_{21}, x_{1}a_{12}+x_{2}a_{22}] +[a_{11}x_{1}+a_{12}x_{2}, a_{21}x_{1}+a_{22}x_{2}]\\ =[x_{1},x_{2}]\begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} +[x_{1},x_{2}]\begin{bmatrix} a_{11} & a_{21}\\ a_{12} & a_{22} \end{bmatrix} =x^{T}A+x^{T}A^{T} =x^{T}(A+A^{T})$

## 相关阅读

• 《矩阵分析与应用》第3章 3.1 Jacobian矩阵与梯度矩阵