[TOC]
OpenTA
Quiz
Analysis: Taylor expansion
The Taylor expansion of an analytic function around $x_0$ is given by $f ( x ) = sum _ { n = 0 } ^ { infty } frac { 1 } { n ! } f ^ { ( n ) } left( x _ { 0 } right) left( x - x _ { 0 } right) ^ { n }$, where $f ^ { ( n ) } left( x _ { 0 } right)$ denotes the $n ^ { t h }$ derivative of $f$ at $x_0$.
-
Taylor-expand the function $f(x)=(1-x)^{-m}$ up to second order ($n=2$) around $x_0=0$, where $m in ,mathbb{R}$ is:
$1 + m cdot x + frac { m cdot ( m + 1 ) } { 2 } cdot x ^ { 2 }$
-
$1 - m cdot x + frac { m ^ { 2 } } { 2 } cdot x ^ { 2 }$
-
$x - 1 - frac { 1 } { 2 } cdot ( x - 1 ) ^ { 2 }$
-
0
Ref:https://math.stackexchange.com/questions/1246136/maclaurin-series-expansion-for-e-1-x2
Linear Algebra
- Matrix determinant:
$det(A)=-1$ (note: $i^2=-1$)
- Matrix exponential
$e^{itheta A}=$
Solution: https://math.stackexchange.com/questions/773359/eulers-identity-in-matrix-form
Homework
3D Boolean functions
In mathematics and logic, a (finitary) Boolean function (or switching function) is a function) of the form $f: mathbf{B}^krightarrow mathbf{B}$, where B= {0, 1} is a Boolean domain and k is a non-negative integer called the arity of the function. In the case where k = 0, the “function” is essentially a constant element of B.
- How many 3-dimensional Boolean functions are there?
For $k$ dimensional inputs, there are $2^{ 2^ k }$ Boolean functions. The number of inputs are $2^k$, for each input, the output can be 0/1.
-
Count the number of symmetries of 3-dimensional Boolean functions that map 4 of the possible input patterns to 1.
ANS: 6. All other patterns can be mapped by rotation or reflection. The reflection may not only reflect by the facet of axes, but by the facet of diagonals.
Symmetry: can be mapped to each other by rotation and/or reflection.
- How many linearly separable 3-dimensional Boolean functions are there?
If the pattern contains facet that have crossed diagonals by the back and white balls, it is not linear separable. For example, the left down pattern in the figure above is not linearly separable.
By checking whether is pattern is like ‘0101xxxx’ or 15 others (12 from each facet of the cube, 4 from the diagonals), we get the answer is 104.
The code:
1 |
def (): |
Linear separability of 4-dimensional Boolean functions
Describe: 4 inputs, 1 output.
1 |
#coding=utf8 |
After $10^5$ updates, example results:
Target A: loss function: 4.000; output: [0.99933648,0.99933648,0.99933648,4.27588495e-05,0.99933648,0.99933648,4.27588495e-05,0.99933648,4.27588495e-05,0.99933648,4.27588495e-05,4.27588495e-05,0.99933648,4.27588495e-05,4.27588495e-05,4.27588495e-05]
Target D: loss function: 0.07137127; output:
[-0.9983045970305993,0.8467004934374702,-0.9999998804920586,-0.9999825727485746,-0.8473914883414136,-0.9983045970305993,-0.7797995354533719,0.99829629025566,-0.999999998772594,-0.999988364023615,-0.9983045970305993,-0.9999825727485746,-0.8473914883414136,0.8467004934374702,-0.9999998804920586,-0.9983045970305993]
True-False Questions
- Two hidden layers are necessary to approximate any real valued-function with N inputs and one output in terms of a perceptron.
False. In general, for $N$ inputs, two hidden layers are sufficient, with $2N$ units in the first layer, and one unit per basis function in second layer. Yet it is not always necessary to use two layers for real-valued functions. For continuous functions, one hidden layer is sufficient. This is ensured by the universal approximation theore. This theorem says any continuous function can be approximated to arbitrary accuracy by a network with a single hidden layer, for sufficiently many neurons in the hidden layer.
- All Boolean functions with N inputs can be represented by a perceptron with one hidden layer.
True
- How the weights are initialised for backpropagation does not matter (provided that they are not all zero) because one usually iterates for many epochs so that the initial conditions are forgotten.
False. If the weights are large, then the gradient goes to zero.
- Nesterov’s accelerated gradient scheme is often more efficient than the simple momentum scheme because the weighting factor of the momentum term increases as a function of iteration number.
False. Nesterov’s accelerated-gradient method is more efficient than the simple
momentum method, because the accelerated-gradient method evaluates the gradient at an extrapolated point, not at the initial point.
- $L_1$-regularisation reduces small weights more than $L_2$-regularisation.
True.
- The number of $N$-dimensional Boolean functions is $2^N$
False. The answer should be $2^{2^N}$
尝试:单选第四个错误、单选第5个错误、45错误、345错误、35错误、
Two-layer perceptron
First hidden layer:
Second hidden layer:
Output:
Classification error:
近期评论