derivative of common neural network modules

Abstract

This post shows how to derive the derivative of common neural network modules such as linear transformation, softmax cross entropy loss and some activation functions step by step.

Notations

Techniques for derivation

Linear Transformation

Forward

Get and


Softmax and Cross Entropy Loss

Forward Process

Get when and

Activation Functions

Sigmoid

Tanh


ReLU