# 用Python来手写一个卷积神经网络(softmax反

## 训练模型

• 前向传播阶段: 输入数据经神经网网络一层一层向前传递，这个过程就是前向传播
• 后向传播阶段: 在整个网络反向逐层更新梯度

• 在前向传播过程中，会将数据(例如输入数据和中间变量)缓存起来起来，以备在反向传播过程使用。就意味着每个反向传播一定会存储与其对应的前向传播
• 在反向传播过程中，神经网络每一层都接受一个梯度，计算后返回一个梯度。这里
$\frac{\partial L}{\partial out}$

$\frac{\partial l}{\partial in}$

# 初始化梯度
# 更新梯度

$f[g(h(x))]$

$g[h(x)]$

$g^{\prime}[h(x)]h^{\prime}(x)$

$f[g(h(x))] = f^{\prime}[g(h(x))]g^{\prime}[h(x)]h^{\prime}(x)$

### 反向传播: Softmax

$L = - \ln(p_c)$

$p_c$

\frac{\partial L}{\partial out_s(i)} \rightarrow \begin{aligned} 0 \, if\, i \neq c \\ -\frac{1}{p^i} \, if \, i = c \end{aligned}

• input 是未展平前的形状
• input 经过展平后
• totals 表示传入到 softmax 激活函数前的值

$out_s(c)$

$t_i$

$out_s(c)$

$out_s(c) = \frac{e^{t_c}}{\sum_i e^{t_i}} = \frac{e^{t_c}}{S}\,\,\, S = \sum_i e^{t_i}$

$k \neq c$

$out_s(c) = e^{t_c}S^{-1}$

$\frac{\partial out_s(c)}{\partial t_k} = \frac{\partial out_s(c)}{\partial S}(\frac{\partial S}{\partial t_k})\\ -e^{t_c}S^{-2} (\frac{\partial S}{\partial t_k})\\ = -e^{t_c}S^{-2} (e^{t_k})\\ = \frac{- e^{t_c}e^{t_k}}{S^2}$

$\frac{\partial out_s(c)}{\partial t_c} = \frac{Se^{t_c} - e^{t_c}\frac{\partial S}{\partial t_c}}{S^2}\\ = \frac{Se^{t_c} - e^{t_c}e^{t_c}}{S^2}\\ = \frac{e^{t_c}(S - e^{t_c})}{S^2}$

$\frac{\partial L}{\partial w_{2,1}} = \frac{\partial L}{\partial a_1} \frac{\partial a_1}{\partial z_1} \frac{\partial z_1}{\partial w_{2,1}} + \frac{\partial L}{\partial a_2} \frac{\partial a_2}{\partial z_1} \frac{\partial z_1}{\partial w_{2,1}}$

$w_{2,1}$

$w_{1，2}$

$\frac{\partial L}{\partial a_1}$

$\frac{\partial a_1}{z_1}$

$\frac{\partial L}{\partial a_1} = \frac{\partial}{\partial a_1} [\sum_j^h -y_i \ln(a_j)]\\ \frac{\partial}{\partial a_1} [\sum_j^h -y_1 \ln(a_1)]\\ \frac{y}{a_1}$

$\frac{\partial a_1}{\partial z_1} = \frac{\partial}{z_1} [\frac{e^{z_1}}{\sum_j^n e^{z^j}}]$

$f(x) = e^{e^{z_1}}$

$g(x) = \sum_{j=1}^n e^{z_j}$

$\frac{f(x)}{g(x)} = \frac{g(x)f^{\prime}(x) - f(x)g^{\prime}(x)}{[g(x)]^2}$

$\frac{\sum_{j=1}^n e^{z_j} \frac{\partial}{\partial z_1}[e^{z_1}] - e^{z_1} \frac{\partial}{\partial z_1} [\sum_{j=1}^n e^{z_j}] }{[\sum_{j=1}^n e^{z_j}]^2}\\ =\frac{[\sum_{j=1}^n e^{z_j}]e^{z_1} - e^{z_1} e^{z_1}]}{[\sum_{j=1}^n e^{z_j}]^2}\\ = \frac{z_1([\sum_{j=1}^n e^{z_j}] - e^{z_1} )}{[\sum_{j=1}^n e^{z_j}]^2}\\ =\frac{e^{z_1}}{[\sum_{j=1}^n e^{z_j}]} \frac{\sum_{j=1}^n e^{z_j}]}{[\sum_{j=1}^n e^{z_j}]}\\ a_1(1-a_1)$

$-a_1a_2$

class Softmax:
# ...

def backprop(self, d_L_d_out):
'''
Performs a backward pass of the softmax layer.
Returns the loss gradient for this layer's inputs.
- d_L_d_out is the loss gradient for this layer's outputs.
'''
# 仅针对输入计算梯度不为 0 的元素，因为只有
continue

# e^totals
t_exp = np.exp(self.last_totals)

S = np.sum(t_exp)

# 具体算法参照上面两个公式
d_out_d_t = -t_exp[i] * t_exp / (S ** 2)
d_out_d_t[i] = t_exp[i] * (S - t_exp[i]) / (S ** 2)

$\frac{\partial out_s(i)}{\partial t}$

$if \, k \neq c \, \frac{\partial out_s(k)}{\partial t} = \frac{- e^{t_c}e^{t_k}}{S^2}\\ if \, k = c \, \frac{\partial out_s(k)}{\partial t} = \frac{e^{t_c}(S - e^{t_c})}{S^2}\\$

• 计算权重梯度
$\frac{\partial L}{\partial w}$

• 计算偏置的梯度
$\frac{\partial L}{\partial b}$

• 在反向求导中将返回变量梯度作为前一层的梯度输入

# 计算 weights/biases/input(权重/偏置/输入)相对于 total
d_t_d_w = self.last_input
d_t_d_b = 1
d_t_d_inputs = self.weights

d_L_d_w = d_t_d_w[np.newaxis].T @ d_L_d_t[np.newaxis]
d_L_d_b = d_L_d_t * d_t_d_b
d_L_d_inputs = d_t_d_inputs @ d_L_d_t

$t = w * input +b$

$\frac{\partial t}{\partial w} = input\\ \frac{\partial t}{\partial b} = 1\\ \frac{\partial t}{\partial input} = w\\$

$\frac{\partial L}{\partial w} = \frac{\partial L}{\partial out} \frac{\partial out}{\partial t} \frac{\partial t}{\partial w} \\ \frac{\partial t}{\partial b} = \frac{\partial L}{\partial out} \frac{\partial out}{\partial t} \frac{\partial t}{\partial b} \\ \frac{\partial t}{\partial input} = \frac{\partial L}{\partial out} \frac{\partial out}{\partial t} \frac{\partial t}{\partial input} \\$

• d_L_d_w 应该是 2 维矩阵，维度应该是

$input \times nodes$

$(input,1)$

和 (1,nodes)

• d_L_d_b

• d_L_d_inputs 我们在看看其维度，由于权重的维度为

$(input,nodes)$

和矩阵

$(nodes,1)$

得到

$input \j len$

的向量

class Softmax
# ...

def backprop(self, d_L_d_out, learn_rate):

continue

t_exp = np.exp(self.last_totals)

S = np.sum(t_exp)

d_out_d_t = -t_exp[i] * t_exp / (S ** 2)
d_out_d_t[i] = t_exp[i] * (S - t_exp[i]) / (S ** 2)

d_t_d_w = self.last_input
d_t_d_b = 1
d_t_d_inputs = self.weights