• 使用单层神经网络OneNet实现逻辑或、逻辑与和逻辑非分类
• 使用2层神经网络TwoNet实现逻辑异或分类
• 使用3层神经网络ThreeNet实现iris数据集和mnist数据集分类

## 使用单层神经网络OneNet实现逻辑或、逻辑与和逻辑非分类

• 输入层有2个神经元
• 输出层有1个神经元
• 评分函数是sigmoid
• 损失函数是交叉熵损失

OneNet就是逻辑回归模型

• $L=1$
• $a^{(0)}\in R^{m\times 2}$
• $W^{(1)}\in R^{2\times 1}$
• $b^{(1)}\in R^{1\times 2}$
• $y\in R^{m\times 1}$，每行数值表示正确类别（0或者1）

$$z^{(1)}=a^{(0)}\cdot W^{(1)} +b^{(1)} \ h(z^{(1)})=p(y=1)=sigmoid(z^{(1)})=\frac {1}{1+e^{-z^{(1)}}} \$$

$$probs=[p(y=0), p(y=1)]=[1-h(z^{(1)}), h(z^{(1)})]\ =[\frac {e^{-z^{(1)}}}{1+e^{-z^{(1)}}}， \frac {1}{1+e^{-z^{(1)}}}] \in R^{m\times 2}$$

$$J(z^{(1)})=-\frac {1}{m} 1^{T}\cdot (y* \ln h(z^{(1)})+(1-y)* \ln (1-h(z^{(1)})))$$

$$J(z^{(1)})=-\frac {1}{m} (y\cdot \ln h(z^{(1)})+(1-y)\cdot \ln (1-h(z^{(1)})))$$

$$dJ=d(-\frac {1}{m} 1^{T}\cdot (y* \ln h(z^{(1)})+(1-y)* \ln (1-h(z^{(1)}))))\ =d(-\frac {1}{m} 1^{T}\cdot (y* \ln h(z^{(1)})))+d(-\frac {1}{m} 1^{T}\cdot (1-y)* \ln (1-h(z^{(1)})))$$

$$d(-\frac {1}{m} 1^{T}\cdot (y* \ln h(z^{(1)})))= d(-\frac {1}{m} 1^{T}\cdot (y* (h(z^{(1)})^{-1}\cdot dh(z^{(1)})))\ =d(-\frac {1}{m} 1^{T}\cdot (y* (h(z^{(1)})^{-1}\cdot h(z^{(1)})\cdot (1-h(z^{(1)})* dz^{(1)}))))\ =d(-\frac {1}{m} 1^{T}\cdot (y* ((1-h(z^{(1)})* dz^{(1)}))))\ =d(-\frac {1}{m} y^{T}\cdot ((1-h(z^{(1)})* dz^{(1)})))\ =d(-\frac {1}{m} y^{T} * (1-h(z^{(1)})^{T}\cdot dz^{(1)}))$$

$$d(-\frac {1}{m} 1^{T}\cdot (1-y)* \ln (1-h(z^{(1)})))=d(-\frac {1}{m} 1^{T}\cdot (1-y)* ((1-h(z^{(1)}))^{-1}\cdot d(1-h(z^{(1)}))))\ =d(-\frac {1}{m} 1^{T}\cdot (1-y)* ((1-h(z^{(1)}))^{-1}\cdot (-1)\cdot (1-h(z^{(1)}))\cdot h(z^{(1)})* dz^{(1)}))\ =d(-\frac {1}{m} 1^{T}\cdot (1-y)* ((-1)\cdot h(z^{(1)})* dz^{(1)}))\ =d(\frac {1}{m} 1^{T}\cdot (1-y)* (h(z^{(1)})* dz^{(1)}))\ =d(\frac {1}{m} (1-y)^{T}\cdot (h(z^{(1)})* dz^{(1)}))\ =d(\frac {1}{m} (1-y)^{T}* h(z^{(1)})^{T}\cdot dz^{(1)})$$

$$dJ=d(-\frac {1}{m} y^{T} * (1-h(z^{(1)})^{T}\cdot dz^{(1)}))+ d(\frac {1}{m} (1-y)^{T}* h(z^{(1)})^{T}\cdot dz^{(1)})\ =d(\frac {1}{m} ((1-y)^{T}* h(z^{(1)})^{T} - y^{T} * (1-h(z^{(1)})^{T})\cdot dz^{(1)}))\ =d(\frac {1}{m} (h(z^{(1)})^{T}-y^{T}* h(z^{(1)})^{T} - y^{T} + y^{T}* h(z^{(1)})^{T})\cdot dz^{(1)}))\ =d(\frac {1}{m} (h(z^{(1)})^{T}- y^{T})\cdot dz^{(1)}))$$

$$D_{z^{(1)}}f(z^{(1)})=\frac {1}{m}\cdot (h(z^{(1)})^{T}- y^{T})\ \bigtriangledown_{z^{(1)}}f(z^{(1)})=\frac {1}{m}\cdot (h(z^{(1)})- y)$$

$$z^{(1)}=a^{(0)}\cdot W^{(1)} +b^{(1)}\ dz^{(1)}=a^{(0)}\cdot dW^{(1)} + db^{(1)}\ dJ=d(\frac {1}{m} (h(z^{(1)})^{T}- y^{T})\cdot dz^{(1)}))\ =d(\frac {1}{m} (h(z^{(1)})^{T}- y^{T})\cdot (a^{(0)}\cdot dW^{(1)} + db^{(1)})))\ =d(\frac {1}{m} (h(z^{(1)})^{T}- y^{T})\cdot a^{(0)}\cdot dW^{(1)})+d(\frac {1}{m} (h(z^{(1)})^{T}- y^{T})\cdot db^{(1)})\$$

$$D_{W^{(1)}}f(W^{(1)})=\frac {1}{m}\cdot (h(z^{(1)})^{T}- y^{T})\cdot a^{(0)}\ \bigtriangledown_{W^{(1)}}f(W^{(1)})=\frac {1}{m}\cdot (a^{(0)})^{T}\cdot (h(z^{(1)})- y)$$

$$D_{b^{(1)}}f(b^{(1)})=\frac {1}{m}\cdot \sum_{i=1}^{m} (h(z_{i}^{(1)})^{T}- y^{T}{i})\ \bigtriangledown {b^{(1)}}f(b^{(1)})=\frac {1}{m}\cdot \sum_{i=1}^{m} (h(z_{i}^{(1)})- y_{i})$$

$$W^{(1)} = W^{(1)} - \alpha\cdot (\nabla_{W^{(1)}} J(W, b)+\lambda \sum W^{(1)})$$

• (0,0) - 0
• (0,1) - 0
• (1,0) - 0
• (1,1) - 1

• (0,0) - 0
• (0,1) - 1
• (1,0) - 1
• (1,1) - 1

• (1) - 0
• (0) - 1

## 使用2层神经网络TwoNet实现逻辑异或分类

• 网络层数$L=2$
• 批量数据$N$
• 输入层神经元个数$D$
• 隐藏层神经元个数$H$
• 输出层神经元个数$K$
• 激活函数是relu
• 评分函数是softmax评分
• 损失函数是交叉熵损失平凡

• $a^{(0)}\in R^{N\times D}$
• $W^{(1)}\in R^{D\times H}$
• $b^{(1)}\in R^{1\times H}$
• $W^{(2)}\in R^{H\times K}$
• $b^{(2)}\in R^{1\times K}$
• $Y\in R^{N\times K}$，每行仅有正确类别为1，其余为0

$$z^{(1)}=a^{(0)}\cdot W^{(1)}+b^{(1)} \ a^{(1)}=relu(z^{(1)}) \ z^{(2)}=a^{(1)}\cdot W^{(2)}+b^{(2)}$$

$$probs=h(z^{(2)})=\frac {exp(z^{(2)})}{exp(z^{(2)})\cdot A\cdot B^{T}}$$

$$dataLoss = -\frac {1}{N} 1^{T}\cdot \ln \frac {exp(z^{(2)}* Y\cdot A)}{exp(z^{2})\cdot A}$$

$$regLoss = 0.5\cdot reg\cdot ||W^{(1)}||^{2} + 0.5\cdot reg\cdot ||W^{(2)}||^{2}$$

$$J(z^{(2)})=dataLoss + regLoss$$

$$d(dataloss) = d(-\frac {1}{N} 1^{T}\cdot \ln \frac {exp(z^{(2)}* Y\cdot A)}{exp(z^{2})\cdot A})\ =tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot dz^{(2)})$$

$$D_{z^{(2)}}f(z^{(2)})=\frac {1}{N} (probs^{T} - Y^{T})\ \bigtriangledown_{z^{(2)}}f(z^{(2)})=\frac {1}{N} (probs - Y)$$

$$z^{(2)}=a^{(1)}\cdot W^{(2)}+b^{(2)}\ dz^{(2)}=da^{(1)}\cdot W^{(2)} + a^{(1)}\cdot dW^{(2)} + db^{(2)}$$

$$d(dataloss) =tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot dz^{(2)})\ =tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot (da^{(1)}\cdot W^{(2)} + a^{(1)}\cdot dW^{(2)} + db^{(2)}))\ tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot da^{(1)}\cdot W^{(2)}) + tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot a^{(1)}\cdot dW^{(2)}) + tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot db^{(2)}))$$

$$d(dataloss)=tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot a^{(1)}\cdot dW^{(2)})$$

$$D_{W^{(2)}}f(W^{(2)})=\frac {1}{N} (probs^{T} - Y^{T})\cdot a^{(1)}\ \bigtriangledown_{W^{(2)}}f(W^{(2)})=\frac {1}{N} (a^{(1)})^{T}\cdot (probs - Y)$$

$$d(dataloss)=tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot db^{(2)}))$$

$$D_{b^{(2)}}f(b^{(2)})=\frac {1}{N} \sum_{i=1}^{N}(probs_{i}^{T} - Y_{i}^{T})\ \bigtriangledown_{b^{(2)}}f(b^{(2)})=\frac {1}{N} \sum_{i=1}^{N}(probs_{i} - Y_{i})$$

$$d(dataloss)=tr(\frac {1}{N} (probs^{T} - Y^{T})\cdot da^{(1)}\cdot W^{(2)}) =tr(\frac {1}{N} W^{(2)}\cdot (probs^{T} - Y^{T})\cdot da^{(1)})$$

$$D_{a^{(1)}}f(a^{(1)})=\frac {1}{N} W^{(2)}\cdot (probs^{T} - Y^{T})\ \bigtriangledown_{a^{(1)}}f(a^{(1)})=\frac {1}{N} (probs - Y)\cdot (W^{(2)})^{T}$$

$$a^{(1)}=relu(z^{(1)})\ da^{(1)}=1(z^{(1)}\geq 0)* dz^{(1)}$$

$$d(dataloss) =tr(\frac {1}{N} W^{(2)}\cdot (probs^{T} - Y^{T})\cdot da^{(1)})\ =tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))\cdot 1(z^{(1)}\geq 0)* dz^{(1)})\ =tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot dz^{(1)})$$

$$D_{z^{(1)}}f(z^{(1)})=\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\ \bigtriangledown_{z^{(1)}}f(z^{(1)})=\frac {1}{N} ((probs - Y)\cdot (W^{(2)})^{T})* 1(z^{(1)}\geq 0)$$

$$z^{(1)}=a^{(0)}\cdot W^{(1)}+b^{(1)}\ dz^{(1)}=da^{(0)}\cdot W^{(1)} + a^{(0)}\cdot dW^{(1)} + db^{(1)}$$

$$d(dataloss) =tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot dz^{(1)})\ =tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot (da^{(0)}\cdot W^{(1)} + a^{(0)}\cdot dW^{(1)} + db^{(1)}))\ =tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot (da^{(0)}\cdot W^{(1)})\ +tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot a^{(0)}\cdot dW^{(1)}) +tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot db^{(1)})$$

$$d(dataloss)=tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot a^{(0)}\cdot dW^{(1)})$$

$$D_{W^{(1)}}f(W^{(1)})=\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot a^{(0)}\ \bigtriangledown_{W^{(1)}}f(W^{(1)})=\frac {1}{N} (a^{(0)})^{T}\cdot (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)$$

$$d(dataloss)=tr(\frac {1}{N} (W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\cdot db^{(1)})$$

$$D_{b^{(1)}}f(b^{(1)})=\frac {1}{N} \sum_{i=1}^{N}(W^{(2)}\cdot (probs^{T} - Y^{T}))* 1(z^{(1)}\geq 0)^{T}\ \bigtriangledown_{b^{(1)}}f(b^{(1)})=\frac {1}{N} \sum_{i=1}^{N}((probs - Y)\cdot (W^{(2)})^{T})* 1(z^{(1)}\geq 0)$$

• (0,0) - 0
• (0,1) - 1
• (1,0) - 1
• (1,1) - 0

## 使用3层神经网络ThreeNet实现iris数据集和mnist数据集分类

• 网络层数$L=3$
• 批量数据$N$
• 输入层神经元个数$D$
• 第一个隐藏层神经元个数$H1$
• 第二个隐藏层神经元个数$H2$
• 输出层神经元个数$K$
• 激活函数是relu
• 评分函数是softmax评分
• 损失函数是交叉熵损失平凡

• $a^{(0)}\in R^{N\times D}$
• $W^{(1)}\in R^{D\times H1}$
• $b^{(1)}\in R^{1\times H1}$
• $W^{(2)}\in R^{H1\times H2}$
• $b^{(2)}\in R^{1\times H2}$
• $W^{(3)}\in R^{H2\times K}$
• $b^{(3)}\in R^{1\times K}$
• $Y\in R^{N\times K}$，每行仅有正确类别为1，其余为0

$$z^{(1)}=a^{(0)}\cdot W^{(1)}+b^{(1)} \ a^{(1)}=relu(z^{(1)}) \ z^{(2)}=a^{(1)}\cdot W^{(2)}+b^{(2)}\ a^{(2)}=relu(z^{(2)}) \ z^{(3)}=a^{(2)}\cdot W^{(3)}+b^{(3)}\$$

$$probs=h(z^{(3)})=\frac {exp(z^{(3)})}{exp(z^{(3)})\cdot A\cdot B^{T}}$$

$$dataLoss = -\frac {1}{N} 1^{T}\cdot \ln \frac {exp(z^{(3)}* Y\cdot A)}{exp(z^{3})\cdot A}$$

$$regLoss = 0.5\cdot reg\cdot ||W^{(1)}||^{2} + 0.5\cdot reg\cdot ||W^{(2)}||^{2} + 0.5\cdot reg\cdot ||W^{(3)}||^{2}$$

$$J(z^{(2)})=dataLoss + regLoss$$

$$\bigtriangledown_{z^{(3)}}f(z^{(3)})=\frac {1}{N} (probs - Y)$$

$$\bigtriangledown_{W^{(3)}}f(W^{(3)})=\frac {1}{N} (a^{(2)})^{T}\cdot (probs - Y)$$

$$\bigtriangledown_{b^{(3)}}f(b^{(3)})=\frac {1}{N} \sum_{i=1}^{N}(probs_{i} - Y_{i})$$

$$\bigtriangledown_{a^{(2)}}f(a^{(2)})=\frac {1}{N} (probs - Y)\cdot (W^{(3)})^{T}$$

$$\bigtriangledown_{z^{(2)}}f(z^{(2)})=\frac {1}{N} ((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0)$$

$$\bigtriangledown_{W^{(2)}}f(W^{(2)})=\frac {1}{N} (a^{(1)})^{T}\cdot ((W^{(3)}\cdot (probs^{T} - Y^{T}))* 1(z^{(2)}\geq 0))$$

$$\bigtriangledown_{b^{(2)}}f(b^{(1)})=\frac {1}{N} \sum_{i=1}^{N}((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0)$$

$$\bigtriangledown_{a^{(1)}}f(a^{(1)})=\frac {1}{N} (((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0))\cdot (W^{(2)})^{T}$$

$$\bigtriangledown_{z^{(2)}}f(z^{(2)})=\frac {1}{N} (((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0)\cdot (W^{(2)})^{T})* 1(z^{(1)}\geq 0)$$

$$\bigtriangledown_{W^{(2)}}f(W^{(2)})=\frac {1}{N} (a^{(1)})^{T}\cdot ((((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0)\cdot (W^{(2)})^{T})* 1(z^{(1)}\geq 0))$$

$$\bigtriangledown_{b^{(2)}}f(b^{(1)})=\frac {1}{N} \sum_{i=1}^{N}(((probs - Y)\cdot (W^{(3)})^{T})* 1(z^{(2)}\geq 0)\cdot (W^{(2)})^{T})* 1(z^{(1)}\geq 0)$$

### iris数据集

4个变量：

• SepalLengthCm - 花萼长度
• SepalWidthCm - 花萼宽度
• PetalLengthCm - 花瓣长度
• PetalWidthCm - 花瓣宽度

• Iris-setosa
• Iris-versicolor
• Iris-virginica

softmax回归神经网络
iris96.67%98.33%

### mnist数据集

mnist数据集是手写数字数据集，共有共有60000张训练图像和10000张测试图像，分别表示数字0-9

softmax回归神经网络
mnist92.15%97.92%