A few months ago, I had the pleasure of reading Eugenia Cheng‘s book How to Bake Pi. Each chapter starts with a recipe which Cheng links to the mathematical concepts contained in the chapter. The book is full of interesting connections between mathematics and the rest of the world.
One of my favourite ideas in the book is something Cheng writes about equations and the humble equals sign: . She explains that when an equation says two things are equal we very rarely mean that they are exactly the same thing. What we really mean is that the two things are the same in some ways even though they may be different in others.
One example that Cheng gives is the equation . This is such a familiar statement that you might really think that
and
are the same thing. Indeed, if
and
are any numbers, then the number you get when you calculate
is the same as the number you get when you calculate
. But calculating
could be very different from calculating
. A young child might calculate
by starting with
and then counting one-by-one from
to
. If
is
and
is
, then calculating
requires counting from
to
but calculating
simply amounts to counting from
to
. The first process takes way longer than the second and the child might disagree that
is the same as
.
In How to Bake Pi, Cheng explains that a crucial idea behind equality is context. When someone says that two things are equal we really mean that they are equal in the context we care about. Cheng talks about how context is crucial through-out mathematics and introduces a little bit of category theory as a tool for moving between different contexts. I think that this idea of context is really illuminating and I wanted to share some examples where “” doesn’t mean “exactly the same as”.
The Sherman-Morrison formula
The Sherman-Morrison formula is a result from linear algebra that says for any invertible matrix and any pair of vectors
, if
, then
is invertible and
Here “” means the following:
- You can take any natural number
, any matrix
of size
by
, and any length
-vectors
and
that satisfy the above condition.
- If you take all those things and carry out all the matrix multiplications, additions and inversions on the left and all the matrix multiplications, additions and inversions on the right, then you will end up with exactly the same matrix in both cases.
But depending on the context, the equation on one side of “” may be much easier than the other. Although the right hand side looks a lot more complicated, it is much easier to compute in one important context. This context is when we have already calculated the matrix
and now want the inverse of
. The left hand side naively computes
which takes
computations since we have to invert a
matrix. On the right hand side, we only need to compute a small number of matrix-vector products and then add two matrices together. This bring the computational cost down to
.
These cost saving measures come up a lot when studying linear regression. The Sherman-Morrison formula can be used to update regression coefficients when a new data point is added. Similarly, the Sherman-Morrison formula can be used to quickly calculate the fitted values in leave-one-out cross validation.
log-sum-exp
This second example also has connections to statistics. In a mixture model, we assume that each data point comes from a distribution of the form:
,
where is a vector and
is equal to the probability that
came from class
. The parameters
are the parameters for the
group.The log-likelihood is thus,
,
where . We can see that the log-likelihood is of the form log-sum-exp. Calculating a log-sum-exp can cause issues with numerical stability. For instance if
and
, for all
, then the final answer is simply
. However, as soon as we try to calculate
on a computer, we’ll be in trouble.
The solution is to use the following equality, for any ,
.
Proving the above identity is a nice exercise in the laws of logarithm’s and exponential’s, but with a clever choice of we can more safely compute the log-sum-exp expression. For instance, in the documentation for pytorch’s implementation of
logsumexp()
they take to be the maximum of
. This (hopefully) makes each of the terms
a reasonable size and avoids any numerical issues.
Again, the left and right hand sides of the above equation might be the same number, but in the context of having to use computers with limited precision, they represent very different calculations.
Beyond How to Bake Pi
Eugenia Cheng has recently published a new book called The Joy of Abstraction. I’m just over half way through and it’s been a really engaging and interesting introduction to category theory. I’m looking forward to reading the rest of it and getting more insight from Eugenia Cheng’s great mathematical writing.