matrix norms and derivatives

Matrix Norms

Matrix norm is a norm on the vector space $mathbb{F}^{m times n}$, where $mathbb{F} = mathbb{R}$ or $mathbb{C}$ denotes the field. Thus, it is a mapping from the vector space to $mathbb{R}$ which satisfies the following properties of norms:

For all scalars $alpha in mathbb{F}$ and for all matrices $boldsymbol{A}, boldsymbol{B} in mathbb{F}^{m times n}$, a norm

  • is absolutely homogeneous: $lVertalphaboldsymbol{A}rVert = lvertalpharvert lVertboldsymbol{A}rVert$;

  • is sub-additive or satisfies the triangle inequality: $lVertboldsymbol{A} + boldsymbol{B}rVert le lVertboldsymbol{A}rVert + lVertboldsymbol{B}rVert$;

  • and is positive-definite: $lVertboldsymbol{A}rVert ge 0$, and $lVertboldsymbol{A}rVert = 0$ iff $boldsymbol{A} = boldsymbol{0}$.

There are several types of matrix norms that satisfy the properties above, and we name a few as follows.

Induced by Vector Norms

The most general vector-induced $(p, q)$-norm of an $m times n$ matrix $boldsymbol{A}$ is defined as

where $||_p$ and $||_q$ are vector norms. When $q = p$, the resulting matrix norm is called $ell_p$ norm for simplicity.

Three noteworthy and widely-used examples are $p = 1, 2$ and $infty$.

  • $p = 1$: $lVertboldsymbol{A}rVert_1 = max_j sum_i lvert a_{ij}rvert$, which is the maximum absolute column sum.

  • $p = infty$: $lVertboldsymbol{A}rVert_infty = max_i sum_j lvert a_{ij}rvert$, which is the maximum absolute row sum.

  • $p = 2$: $lVertboldsymbol{A}rVert_2 = sigma_{max} boldsymbol{A}$, which is the largest singular value.

Entrywise

The most famous entrywise matrix norm is the Frobenius norm: $lVertboldsymbol{A}rVert_F = (sum_i sum_j lvert a_{ij}rvert^2)^{1/2}$. An important inequality relating $ell_2$ norm to the Frobenius norm states that $lVertboldsymbol{A}rVert_2 le lVertboldsymbol{A}rVert_F$.

References

Matrix derivatives

We first introduce some direct extensions of scalar derivative.

Then, following the right-side formula above, we
have

where $boldsymbol{f}, boldsymbol{g} in mathbb{R}^{m}, boldsymbol{x} in mathbb{R}^n$. This has used the definition

It is weird that some materials directly define

The right way is to define

Also, the notation of taking the derivative of a matrix w.r.t. a vector
like $frac{partial boldsymbol{A}}{partial boldsymbol{x}}$ is weird. I don’t understand
why it keeps showing up in different materials.

References

  • Wikipedia: https://en.wikipedia.org/wiki/Matrix_calculus.

  • Appendix C of Pattern Recognition and Machine Learning.

  • Lecture notes of Introduction to System Engineering, Prof. Jianming Hu, Department of Automation, Tsinghua University, Spring 2018.