In my previous post I illustrated why it is not possible to compute the Jordan canonical form numerically (i.e. in floating point numbers). The simple reason: For every matrix {A} and every {\epsilon>0} there is a matrix {A_{\epsilon}} which differs from {A} by at most {\epsilon} (e.g. in every entry – but all norms for matrices are equivalent, so this does not really play a role) such that {A_{\epsilon}} is diagonalizable. So why should you bother about computing the Jordan canonical form anyway? Or even learning or teaching it? Well, the prime application of the Jordan canonical form is to calculate solutions of linear systems of ODEs. The equation

\displaystyle y'(t) = Ay(t),\quad y(0) = y_{0}

with matrix {A\in {\mathbb R}^{n\times n}} and initial value {y_{0}\in{\mathbb R}^{n}} (both could also be complex). This system has a unique solution which can be given explicitly with the help of the matrix exponential as

\displaystyle y(t) = \exp(At)y_{0}

where the matrix exponential is

\displaystyle \exp(At) = \sum_{k=0}^{\infty}\frac{A^{k}t^{k}}{k!}.

It is not always simple to work out the matrix exponential by hand. The straightforward way would be to calculate all the powers of {A}, weight them by {1/k!} and sum the series. This may be a challenge, even for simple matrices. My favorite example is the matrix

\displaystyle A = \begin{bmatrix} 0 & 1\\ 1 & 1 \end{bmatrix}.

Its first powers are

\displaystyle A^{2} = \begin{bmatrix} 1 & 1\\ 1 & 2 \end{bmatrix},\quad A^{3} = \begin{bmatrix} 1 & 2\\ 2 & 3 \end{bmatrix}

\displaystyle A^{4} = \begin{bmatrix} 2 & 3\\ 3 & 5 \end{bmatrix},\quad A^{5} = \begin{bmatrix} 3 & 5\\ 5 & 8 \end{bmatrix}.

You may notice that the Fibonicci numbers appear (and this is pretty clear on a second thought). So, finding a explicit form for {\exp(A)} leads us to finding an explicit form for the {k}-th Fibonacci number (which is possible, but I will not treat this here).

Another way is diagonalization: If {A} is diagonalizable, i.e. there is an invertible matrix {S} and a diagonal matrix {D} such that

\displaystyle S^{-1}AS = D\quad\text{or, equivalently}\quad A = SDS^{-1},

you see that

\displaystyle \exp(At) = S\exp(Dt)S^{-1}

and the matrix exponential of a diagonal matrix is simply the exponential function applied to the diagonal entries.

But not all matrices are diagonalizable! The solution that is usually presented in the classroom is to use the Jordan canonical form instead and to compute the matrix exponential of Jordan blocks (using that you can split a Jordan block {J = D+N} into the sum of a diagonal matrix {D} and a nil-potent matrix {N} and since {D} and {N} commute one can calculate {\exp(J) = \exp(D)\exp(N)} and both matrix exponentials are quite easy to compute).

But in light of the fact that there are a diagonalizable matrices arbitrarily close to any matrix, on may ask: What about replacing a non-diagonalizable matrix {A} with a diagonalizable one (with a small error) and then use this one?

Let’s try this on a simple example:

We consider

\displaystyle A = \begin{bmatrix} -1 & 1\\ 0 & -1 \end{bmatrix}

which is not diagonalizable. The linear initial value problem

\displaystyle y' = Ay,\quad y(0) = y_{0}

has the solution

\displaystyle y(t) = \exp( \begin{bmatrix} -t & t\\ 0 & -t \end{bmatrix}) y_{0}

and the matrix exponential is

\displaystyle \begin{array}{rcl} \exp( \begin{bmatrix} -t & t\\ 0 & -t \end{bmatrix}) & = &\exp(\begin{bmatrix} -t & 0\\ 0 & -t \end{bmatrix})\exp(\begin{bmatrix} 0 & t\\ 0 & 0 \end{bmatrix})\\& = &\begin{bmatrix} \exp(-t) & 0\\ 0 & \exp(-t) \end{bmatrix}\begin{bmatrix} 1 & t\\ 0 & 1 \end{bmatrix}\\ &=& \begin{bmatrix} \exp(-t) & t\exp(-t)\\ 0 & \exp(-t) \end{bmatrix}. \end{array}

So we get the solution

\displaystyle y(t) = \begin{bmatrix} e^{-t}(y^{0}_{1} + ty^{0}_{2})\\ e^{-t}y^{0}_{2} \end{bmatrix}.

Let us take a close-by matrix which is diagonalizable. For some small {\epsilon} we choose

\displaystyle A_{\epsilon} = \begin{bmatrix} -1 & 1\\ 0 & -1+\epsilon \end{bmatrix}.

Since {A_{\epsilon}} is upper triangular, it has its eigenvalues on the diagonal. Since {\epsilon\neq 0}, there are two distinct eigenvalues and hence, {A_{\epsilon}} is diagonalizable. Indeed, with

\displaystyle S = \begin{bmatrix} 1 & 1\\ 0 & \epsilon \end{bmatrix},\quad S^{-1}= \begin{bmatrix} 1 & -\tfrac1\epsilon\\ 0 & \tfrac1\epsilon \end{bmatrix}

we get

\displaystyle A = S \begin{bmatrix} -1 & 0 \\ 0 & -1+\epsilon \end{bmatrix}S^{-1}.

The matrix exponential of {A_{\epsilon}t} is

\displaystyle \begin{array}{rcl} \exp(A_{\epsilon}t) &=& S\exp( \begin{bmatrix} -t & 0\\ 0 & -t(1-\epsilon) \end{bmatrix} )S^{-1}\\ &=& \begin{bmatrix} e^{-t} & \tfrac{e^{-(1-\epsilon)t} - e^{-t}}{\epsilon}\\ 0 & e^{-(1-\epsilon)t} \end{bmatrix}. \end{array}

Hence, the solution of {y' = Ay}, {y(0) = y_{0}} is

\displaystyle y(t) = \begin{bmatrix} e^{-t}y^{0}_{1} + \tfrac{e^{-(1-\epsilon)t} - e^{-t}}{\epsilon}y^{0}_{2}\\ e^{-(1-\epsilon)t}y^{0}_{2} \end{bmatrix}.

How is this related to the solution of {y'=Ay}? How far is it away?

Of course, the lower right entry of {\exp(A_{\epsilon}t)} converges to {e^{-t}} for {\epsilon \rightarrow 0}, but what about the upper right entry? Note that the entry

\displaystyle \tfrac{e^{-(1-\epsilon)t} - e^{-t}}{\epsilon}

is nothing else that the (negative) difference quotient for the derivative of the function {f(a) = e^{-at}} at {a=1}. Hence

\displaystyle \tfrac{e^{-(1-\epsilon)t} - e^{-t}}{\epsilon} \stackrel{\epsilon\rightarrow 0}{\longrightarrow} -f'(1) = te^{-t}

and we get

\displaystyle \exp(A_{\epsilon}t) \stackrel{\epsilon\rightarrow 0}{\longrightarrow} \begin{bmatrix} e^{-t} & te^{-t}\\ 0 & e^{-t} \end{bmatrix} = \exp(At)

as expected.

It turns out that a fairly big \epsilon is already enough to get a quite good approximation and even the correct asymptotics: The blue curve it first component of the exact solution (initialized with the second standard basis vector), the red one corresponds \epsilon = 0.1 and the yellow on (pretty close to the blue on) is for \epsilon = 0.01.

 

 

to  $\e102_jordan_ode

 

Advertisements

I remember from my introductory class in linear algebra that my instructor said

It is impossible to calculate the Jordan canonical form of a matrix numerically.

Another quote I remember is

The Jordan canonical form does not depend continuously on the matrix.

For both quotes I did not remember the underlying reasons and since I do teach an introductory class on linear algebra this year, I got thinking about these issues again.

Here is a very simple example for the fact in the second quote:

Consider the matrix

\displaystyle A_{\varepsilon} = \begin{pmatrix}1 & \varepsilon\\ 0 & 1\end{pmatrix}

for {\varepsilon>0}. This matrix has {1} as a double eigenvalue, but the corresponding eigenspace is one-dimensional and spanned by {v_{1} = e_{1}}. To extend this vector to a basis we calculate a principle vector by solving

\displaystyle (A_{\varepsilon}-I)v_{2} = v_{1}

which leads to

\displaystyle v_{2} = \begin{pmatrix} 0\\\varepsilon^{-1} \end{pmatrix}.

Defining {S = [v_{1}\, v_{2}]} we get the Jordan canonical form of {A_{\varepsilon}} as

\displaystyle J_{\varepsilon} = S^{-1}A_{\varepsilon}S = \begin{pmatrix} 1 & 1\\ 0 & 1 \end{pmatrix}

So we have

\displaystyle A_{\varepsilon}\rightarrow A = I\quad\text{and}\quad J_{\varepsilon} \rightarrow J = \begin{pmatrix} 1 & 1\\ 0 & 1 \end{pmatrix},

but {J} is not the Jordan canonical form of {A}. So, in short: The Jordan canonical form of the limit of {A_{\varepsilon}} is not the limit of the Jordan canonical form of {A_{\varepsilon}}. In other words: Taking limits does not commute with forming the Jordan canonical form.

A side note: Of course, the Jordan canonical form is not even unique in general, so speaking of “dependence on the matrix” is an issue. What we have shown is, that there is no way to get continuous dependence on the matrix even if non-uniqueness is not an issue (like in the example above).

What about the first quote? Why is it impossible to compute the Jordan canonical form numerically? Let’s just try! We start with the simplest non-diagonalizable matrix

\displaystyle A = \begin{pmatrix} 1 & 1\\ 0 & 1 \end{pmatrix}

If we ask MATLAB or Octave to do the eigenvalue decomposition we get

>> [V,D] = diag(A)
V =

1.00000 -1.00000
0.00000  0.00000

D =

1 0
0 1

We see that {V} does not seem to be invertible and indeed we get

>> rank(V)
ans = 1

What is happening? MATLAB did not promise the produce an invertible {V} and it does not promise that the putcome would fulfill {V^{-1}AV = D} (which is my definition of diagonalizability). It does promise that {AV = VD} and in fact

>> A*V
ans =

1.00000  1.00000
0.00000 -0.00000

>> V*D
ans =

1.00000 -1.00000
0.00000  0.00000

>> A*V-V*D
ans =

0.0000e+00 2.2204e-16
0.0000e+00 0.0000e+00

so the promised identity is fulfilled up to machine precision (which is actually equal \texttt{2.2204e-16} and we denote it by {\epsilon} from here on).

How did MATLAB diagonalize this matrix? Here is the thing: The diagonalizable matrices are dense in {{\mathbb C}^{n\times n}}! (You probably have heard that before…) What does that mean numerically? Any matrix that you represent in floating point numbers is actually a representative of a whole bunch of matrices. Each entry is only known up to a certain precision. But this bunch of matrices does contain some matrix which is diagonalizable! This is exactly, what it means to be a dense set! So it is impossible to say if a matrix given in floating point numbers is actually diagonalizable or not. So, what matrix was diagonalized by MATLAB? Let us have closer look at the matrix {V}: The entries in the first row are in fact {1} and {-1}:

>> V(1,:)
ans =
1 -1

In the second row we have

>> V(2,1)
ans =
0
>> V(2,2)
ans = 2.2204e-16

and there we have it. The inverse of {V} does exist (although the matrix has numerical rank {1}) and it is

>> inv(V) warning: matrix singular to machine precision, rcond = 1.11022e-16
ans =

1.0000e+00 4.5036e+15
0.0000e+00 4.5036e+15

and note that \texttt{4.5036e+15} is indeed just the inverse of the machine precision, so this inverse is actually 100% accurate. Recombining gives

>> inv(V)*D*V warning: matrix singular to machine precision, rcond = 1.11022e-16
ans =

1 0
0 1

which is not even close to our original {A}.
How is that? Here is a solution. The matrix

\displaystyle \tilde A = \begin{pmatrix} 1 & 1\\ 0 & 1+\epsilon^{-2} \end{pmatrix}

is indistinguishable from {A} numerically. However, it has two distinct eigenvalues, so it is diagonalizable. Indeed, a basis of eigenvectors is

\displaystyle \tilde V = \begin{pmatrix} 1 & -1\\ 0 & \epsilon^{-2} \end{pmatrix}

which is indistinguishable from {V} above and it holds that

\displaystyle \tilde V^{-1}\tilde A\tilde V = \begin{pmatrix} 1 & 0\\ 0 & 1+\epsilon^{-2}. \end{pmatrix}

which is indistinguishable from D.

Im Wintersemster 2018/19 habe ich die Vorlesung “Lineare Algebra 1” gehalten. Hier die lecture notes dazu:

This is a short note to self: Let A be a symmetric positive semidefinite matrix with one-dimensional kernel spanned by v. How to solve Ax=b (if you know that b is in the range of A)? Just typing

x = A\b

should give a warning in a reasonable software (but also should produce some correct result, if it returns anything at all).

If you don’t want that warning and also want to get the solution that is orthogonal to the kernel you should do

x = (A+v*v')\b.

Note that A + vv^T has full rank (and v is still an eigenvector, but now for the eigenvalue \|v\|^2).

Surely, the solution of Ax=b which is orthogonal to the kernel of A  also solves this (A+vv^T)x = b since (A+vv^T)x = Ax + vv^Tx = Ax = b. Conversely, if x solves (A + vv^T)x = b, then taking the inner product with v gives (Ax)^Tv + (v^Tx)^2 = b^Tv and since b^Tv = 0 and (Ax)^T v = x^TAv = 0 it follows that v^T x = 0 which shows that both Ax=b and that x is orthogonal to the kernel.

Also, if you want the solution which is orthogonal to some z (and not to the kernel of A) you can solve (A + zz^T)x=b. By taking the inner product with v, you get that v^T z\, x^T z=0 and you get x\bot z as soon as v^Tz\neq 0.