neural network

Introducation

The network is composed of an input layer, an output layer, and optionally a series of hidden layers.
Input layer does not carry out any operation/function, but the hidden and output layers consist of functional neurons, i.e., have activation functions.

begin{align*}
y = f(mathbf{w} mathbf{x} - theta)
end{align*}

where (f) is termed activation function.

Activation function aims to introduce the non-linear capability, compensate the shortage of linear models.

begin{align*}
text{sign}(x) = begin{cases}
1, & x ge 0; \
0, & x < 0.
end{cases}
end{align*}

begin{align*}
text{sigmoid}(x) = frac{1}{1 + e^{-x}}.
end{align*}

begin{align*}
text{ReLU}(x) = max(x, 0)
end{align*}

Softmax function can transform scores/logits into a valid probability distribution.

begin{align*}
text{softmax}(mathbf{x}_{i}) = frac{exp(mathbf{x}_{i})}{sum_{j}exp(mathbf{x}_{j})}
end{align*}

begin{align*}
mathcal{H}_{p}(p^{prime}) = sum_{i}p_{i}logfrac{1}{p^{prime}_i}
end{align*}

where