# gemmlowp

### Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

#### Quantization scheme

Equation(1): [
r=S(q-Z)\
where,bold{S},is,scale,,bold{Z},is,zero-point\
where,bold{r},is,real,value,,bold{q},is,quantization,value.
]
Consider the multiplication of two square N X N matrices of real numbers, (r_{1}) and (r_2), with their product represented by (r_3=r_1*r_2).We denote the entries of each of these matrices (r_{alpha}(alpha=1, 2, or,3)) as (r_{alpha}^{(i,j)}) for (1le i,jle N), and the quantization parameters with which they are quantized as ((S_{alpha}, Z_{alpha})).We denote the quantized entries by (q_{alpha}^{(i,j)}). so the equation is follow:

Equation(2): [
r_{alpha}^{(i,j)}=S_{alpha}(q_{alpha}^{(i,j)}-Z_{alpha})
]
From the definition of matrix multiplication, we have the follow equation:

Equation(3): [
S_3(q_{3}^{(i,k)}-Z_3)=sum_{j=1}^{N}S_{1}(q_1^{(i,j)}-Z_{1})S_{2}(q_2^{(j,k)}-Z_2)
]
So, we can be rewritten as:

Equation(4): [
q_3^{i,k}=Z_3+Msum_{j=1}^{N}(q_1^{(i,j)}-Z_1)(q_2^{(j,k)}-Z_2)\
where,as,,M =,frac{S_1S_2}{S_3}
]
In Equation (4), the only non-integer is the multiplier M. As a constant depending only on the quantization scales (S1, S2, S3), it can be computed offline. We empirically find it to always be in the interval (0, 1), and can therefore express it in the normalized form: [
M=2^{-n}M_0\
where,,M0,,is,,in,,the,,interval,,[0.5, 1)
]
The following is the example code from the Google gemmlowp’s quantization example code, the example code include the scale and zero-point calculation and convert the float to uint8 data. and run the uint8 in the inference stage. Here the code by gemmlowp: if you feel like test this code, you can clone the gemmlowp repository and add the follow code to {gemm_dir}/contrib/CMakeList.txt: then you can `mkdir build; pushd build; cmake ../contrib/; make` you’ll get the quantization_example executed file.

QQ: 329804334

Website: www.weaf.top

Mail: [email protected]

Github: https://github.com/Milittle