Efficient Matrix Operations through Diagonalizability

In this blog post, I’ll talk about diagonalizability, what it is, and why it may be useful to diagonalize matrices (when they can be) to efficiently compute operations on matrices. I won’t go into detail when a matrix is diagonalizable, but it will be briefly mentioned in an example.

If is similar to a diagonal matrix , then for some invertible matrix and some diagonal matrix , and is diagonalizable. Only square matrices are (possibly) diagonalizable. The matrix has columns given by (distinct, linearly independent) eigenvectors of the linear transformation given by , and is a diagonal matrix whose diagonal entries contain eigenvalues corresponding to the eigenvectors of the linear transformation given by .

Being able to diagonalize matrices this way is useful, since it allows for easy and fast computation of functions of .

For example, if we wanted to compute , then the naive matrix multiplication algorithm will take time, and the best known algorithm is still . However, if we know that is diagonalizable, then we have:

An induction on the power we raise to shows that in general .

Next, we show that this makes our life easier, since .

First, let’s consider . We have .

In general, for a matrix product we have - the entry of is the dot product of the th row of and the th column of . Therefore, we have so if then , otherwise .

Thus, we have and an inductive argument gives our above result for .

This drastically reduces the complexity of computing : instead of matrix multiplications, taking with a good algorithm, we have an pass to exponentiate across the diagonal, followed by two matrix multiplications (the first matrix multiplication with the diagonal matrix is also more efficient, as the rows of are just scaled), for a complexity of - and the latter complexity does not grow with .

We can actually make a similar argument for computing any function with a matrix argument: , when is diagonalizable.

To show why this is the case, let’s consider the taylor series expansion of , where , around (we are assuming here that is infinitely differentiable around ):

To bring in the matrix, we can substitute in place of , and sums now become matrix sums and products scale the matrix:

Next, since is diagonalizable and , we can subsitute:

Writing the above expression as a sum yields the following:

The term should be familiar - we know from above that that is just . Thus, we’re left with .

But , so .

Next, all that’s left to show is that . We use a similar approach with the Taylor series expansion around :

This is just a bunch of matrix scaling and summation, so let’s move some things inside the matrices:

Now, adding up the matrices, we get:

But each nonzero term is the Taylor series expansion for the corresponding , so this is just

, so the proof is complete.

This is useful since it allows us to compute functions over matrices much more easily, as we saw above with exponentiation. Consider for example . A valid notion of exponentiating matrices can be given by defining similar to :

(As an aside, it turns out that the above sum is a much more natural way to think about the function, instead of thinking of it as raising to some power. Read this fascinating Quora answer to find out why.)

Clearly, computing matrix powers like this to approximate is very expensive. But if is diagonalizable, we have a much less expensive approach to computing :

An application: predicting the growth of a portfolio through solving linear differential equations

Consider a simple portfolio given by dollars invested into two investments (say IBM stock and a money market fund). Then, this portfolio lives in a two-dimensional vector space .

Assume further that we expect IBM stock to have a long-term growth rate of 3% (indeed an egregnious assumption for any individual stock), and to issue a 2% dividend every year. We also will assume that the money market fund will provide a consistent 2% return on investment. Let’s say that we initially have $100 invested in IBM stock, and 500 invested in the money market fund.

With these assumptions, the growth of our portfolio every year can be given by a linear transformation who’s matrix is given by where are the standard basis vectors.

We have since one dollar of stock returns 1.03 in the stock and 0.02 as a dividend, and since one dollar in the money market fund does not return anything in the stock, and returns 1.02 in the fund.

Thus, the matrix of the linear transformation is given by , and we can model the growth of our portfolio as a first-order linear differential equation:

If were diagonalizable, we’d have a simple, exact solution, so we attempt to diagonalize . Since is upper triangular, we have that the eigenvalues for are its diagonal entries . Since ’s eigenvalues both have algebraic multiplicity one, they must have geometric multiplicity one, and thus is diagonalizable.

We compute and to see that the eigenspace corresponding to and are and , respectively.

Thus, we have with , and .

To simplify things, we can multiply both sides by to get . Now, we let so that . Expanding this, we have

, and we get the uncoupled differential equations

, which we solve separately to get , for some constants . Now, since , we recover by applying :

. Multiplying, we have that

and . Taking into account our initial conditions, we have that and , so we have that . Thus, our final solution is and , in which we can plug in a value for to estimate the value of our portfolio at that time.

Written on July 21, 2019