## How to Bake Pi, Sherman-Morrison and log-sum-exp

A few months ago, I had the pleasure of reading Eugenia Cheng‘s book How to Bake Pi. Each chapter starts with a recipe which Cheng links to the mathematical concepts contained in the chapter. The book is full of interesting connections between mathematics and the rest of the world.

One of my favourite ideas in the book is something Cheng writes about equations and the humble equals sign: $=$. She explains that when an equation says two things are equal we very rarely mean that they are exactly the same thing. What we really mean is that the two things are the same in some ways even though they may be different in others.

One example that Cheng gives is the equation $a + b = b+a$. This is such a familiar statement that you might really think that $a+b$ and $b+a$ are the same thing. Indeed, if $a$ and $b$ are any numbers, then the number you get when you calculate $a + b$ is the same as the number you get when you calculate $b + a$. But calculating $a+b$ could be very different from calculating $b+a$. A young child might calculate $a+b$ by starting with $a$ and then counting one-by-one from $a$ to $a + b$. If $a$ is $1$ and $b$ is $20$, then calculating $a + b$ requires counting from $1$ to $21$ but calculating $b+a$ simply amounts to counting from $20$ to $21$. The first process takes way longer than the second and the child might disagree that $1 + 20$ is the same as $20 + 1$.

In How to Bake Pi, Cheng explains that a crucial idea behind equality is context. When someone says that two things are equal we really mean that they are equal in the context we care about. Cheng talks about how context is crucial through-out mathematics and introduces a little bit of category theory as a tool for moving between different contexts. I think that this idea of context is really illuminating and I wanted to share some examples where “$=$” doesn’t mean “exactly the same as”.

### The Sherman-Morrison formula

The Sherman-Morrison formula is a result from linear algebra that says for any invertible matrix $A \in \mathbb{R}^{n\times n}$ and any pair of vectors $u,v \in \mathbb{R}^n$, if $v^TA^{-1}u \neq -1$, then $A + uv^T$ is invertible and

$(A + uv^T)^{-1} = A^{-1} + \frac{A^{-1}uv^TA^{-1}}{1 + v^TA^{-1}u}$

Here “$=$” means the following:

1. You can take any natural number $n$, any matrix $A$ of size $n$ by $n$, and any length $n$-vectors $u$ and $v$ that satisfy the above condition.
2. If you take all those things and carry out all the matrix multiplications, additions and inversions on the left and all the matrix multiplications, additions and inversions on the right, then you will end up with exactly the same matrix in both cases.

But depending on the context, the equation on one side of “$=$” may be much easier than the other. Although the right hand side looks a lot more complicated, it is much easier to compute in one important context. This context is when we have already calculated the matrix $A^{-1}$ and now want the inverse of $A + uv^T$. The left hand side naively computes $A + uv^T$ which takes $O(n^3)$ computations since we have to invert a $n \times n$ matrix. On the right hand side, we only need to compute a small number of matrix-vector products and then add two matrices together. This bring the computational cost down to $O(n^2)$.

These cost saving measures come up a lot when studying linear regression. The Sherman-Morrison formula can be used to update regression coefficients when a new data point is added. Similarly, the Sherman-Morrison formula can be used to quickly calculate the fitted values in leave-one-out cross validation.

### log-sum-exp

This second example also has connections to statistics. In a mixture model, we assume that each data point $Y$ comes from a distribution of the form:

$p(y|\pi,\theta) = \sum_{k=1}^K \pi_k p(y | \theta_k)$,

where $\pi$ is a vector and $\pi_k$ is equal to the probability that $Y$ came from class $k$. The parameters $\theta_k \in\mathbb{R}^p$ are the parameters for the $k^{th}$ group.The log-likelihood is thus,

$\log\left(\sum_{k=1}^K \pi_k p(y | \theta_k)\right) = \log\left(\sum_{k=1}^K \exp(\eta_{k})\right)$,

where $\eta_{k} = \log(\pi_k p(y| \theta_k))$. We can see that the log-likelihood is of the form log-sum-exp. Calculating a log-sum-exp can cause issues with numerical stability. For instance if $K = 3$ and $\eta_k = 1000$, for all $k=1,2,3$, then the final answer is simply $\log(3)+1000$. However, as soon as we try to calculate $\exp(1000)$ on a computer, we’ll be in trouble.

The solution is to use the following equality, for any $\beta \in \mathbb{R}$,

$\log\left(\sum_{k=1}^K \exp(\eta_k) \right) = \beta + \log\left(\sum_{k=1}^K \exp(\beta - \eta_k)\right)$.

Proving the above identity is a nice exercise in the laws of logarithm’s and exponential’s, but with a clever choice of $\beta$ we can more safely compute the log-sum-exp expression. For instance, in the documentation for pytorch’s implementation of logsumexp() they take $\beta$ to be the maximum of $\eta_k$. This (hopefully) makes each of the terms $\beta - \eta_k$ a reasonable size and avoids any numerical issues.

Again, the left and right hand sides of the above equation might be the same number, but in the context of having to use computers with limited precision, they represent very different calculations.

### Beyond How to Bake Pi

Eugenia Cheng has recently published a new book called The Joy of Abstraction. I’m just over half way through and it’s been a really engaging and interesting introduction to category theory. I’m looking forward to reading the rest of it and getting more insight from Eugenia Cheng’s great mathematical writing.

## Looking back on the blog

Next week I’ll be starting the second year of my statistics PhD. I’ve learnt a lot from taking the first year classes and studying for the qualifying exams. Some of what I’ve learnt has given me a new perspective on some of my old blog posts. Here are three things that I’ve written about before and I now understand better.

## 1. The Pi-Lambda Theorem

An early post on this blog was titled “A minimal counterexample in probability theory“. The post was about a theorem from the probability course offered at the Australian National University. The theorem states that if two probability measures agree on a collection of subsets and the collection is closed under finite intersections, then the two probability measures agree on the $\sigma$-algebra generated by the collection. In my post I give an example which shows that you need the collection to be closed under finite intersections. I also show that you need to have at least four points in the space to find such an example.

What I didn’t know then is that the above theorem is really a corollary of Dykin’s $\pi - \lambda$ theorem. This theorem was proved in my first graduate probability course which was taught by Persi Diaconis. Professor Diaconis kept a running tally of how many times we used the $\pi - \lambda$ theorem in his course and we got up to at least 10. (For more appearances by Professor Diaconis on this blog see here, here and here).

If I were to write the above post again, I would talk about the $\pi - \lambda$ theorem and rename the post “The smallest $\lambda$-system”. The example given in my post is really about needing at least four points to find a $\lambda$-system that is not a $\sigma$-algebra.

## 2. Mallow’s Cp statistic

The very first blog post I wrote was called “Complexity Penalties in Statistical Learning“. I wasn’t sure if I would write a second and so I didn’t set up a WordPress account. I instead put the post on the website LessWrong. I no longer associate myself with the rationality community but posting to LessWrong was straight forward and helped me reach more people.

The post was inspired in two ways by the 2019 AMSI summer school. First, the content is from the statistical learning course I took at the summer school. Second, at the career fair many employers advised us to work on our writing skills. I don’t know if would have started blogging if not for the AMSI Summer School.

I didn’t know it at the time but the blog post is really about Mallow’s Cp statistic. Mallow’s Cp statistic is an estimate of the test error of a regression model fit using ordinary least squares. The Mallow’s Cp is equal to the training error plus a “complexity penalty” which takes into account the number of parameters. In the blog post I talk about model complexity and over-fitting. I also write down and explain Mallow’s Cp in the special case of polynomial regression.

In the summer school course I took, I don’t remember the name Mallow’s Cp being used but I thought it was a great idea and enjoyed writing about it. The next time encountered Mallow’s Cp was in the linear models course I took last fall. I was delighted to see it again and learn how it fit into a bigger picture. More recently, I read Brad Efron’s paper “How Biased is the Apparent Error Rate of a Prediction Rule?“. The paper introduces the idea of “effective degrees of freedom” and expands on the ideas behind the Cp statistic.

Incidentally, enrolment is now open for the 2023 AMSI Summer School! This summer it will be hosted at the University of Melbourne. I encourage any Australia mathematics or statistics students reading this to take a look and apply. I really enjoyed going in both 2019 and 2020. (Also if you click on the above link you can try to spot me in the group photo of everyone wearing read shirts!)

## 3. Finitely additive probability measures

In “Finitely additive measures” I talk about how hard it is to define a finitely additive measure on the set of natural numbers that is not countably additive. In particular, I talked about needing to use the Hahn — Banach extension theorem to extend the natural density from the collection of sets with density to the collection of all subsets of the natural numbers.

There were a number of homework problems in my first graduate probability course that relate to this post. We proved that the sets with density are not closed under finite unions and we showed that the square free numbers have density $6/\pi^2$.

We also proved that any finite measure defined on an algebra of subsets can be extend to the collection of all subsets. This proof used Zorn’s lemma and the resulting measure is far from unique. The use of Zorn’s lemma relates to the main idea in my blog, that defining an additive probability measure is in some sense non-constructive.

## Other posts

Going forward, I hope to continue publishing at least one new post every month. I look forward to one day writing another post like this when I can look back and reflect on how much I have learnt.

## A clock is a one-dimensional subgroup of the torus

Recently, my partner and I installed a clock in our home. The clock previously belonged to my grandparents and we have owned it for a while. We hadn’t put it up earlier because the original clock movement ticked and the sound would disrupt our small studio apartment. After much procrastinating, I bought a new clock movement, replaced the old one and proudly hung up our clock.

When I first put on the clock hands I made the mistake of not putting them both on at exactly 12 o’clock. This meant that the minute and hour hands were not synchronised. The hands were in an impossible position. At times, the minute hand was at 12 and the hour hand was between 3 and 4. It took some time for me to register my mistake as at some times of the day it can be hard to tell that the hands are out of sync (how often do you look at a clock at 12:00 exactly?). Fortunately, I did notice the mistake and we have a correct clock. Now I can’t help noticing when others make the same mistake such as in this piece of clip art.

After fixing the clock, I was still thinking about how only some clock hand positions correspond to actual times. This led me to think “a clock is a one-dimensional subgroup of the torus”. Let me explain why.

### The torus

The minute and hour hands on a clock can be thought of as two points on two different circles. For instance, if the time is 9:30, then the minute hand corresponds to a point at the very bottom of the circle and the hour hand corresponds to a point 15 degrees clockwise of the leftmost point of the circle. As a clock goes through a 12 hour cycle the minute-hand-point goes around the circle 12 times and the hour-hand-point goes around the circle once. This is shown below.

If you take the collection of all pairs of points on a circle you get what mathematicians call a torus. The torus is a geometric shape that looks like the surface of a donut. The torus is defined as the Cartesian product of two circles. That is, a single point on the torus corresponds to two points on two different circles. A torus is plotted below.

To understand the torus, it’s helpful to consider a more familiar example, the 2-dimensional plane. If we have points $x$ and $y$ on two different lines, then we can produce the point $(x,y)$ in the two dimensional plane. Likewise, if we have a point $p$ and a point $q$ on two different circles, then we can produce a point $(p,q)$ on the torus. Both of these concepts are illustrated below. I have added two circles to the torus which are analogous to the x and y axes of the plane. The blue and red points on the blue and red circle produce the black point on the torus.

### Mapping the clock to the torus

The points on the torus are in one-to-one correspondence with possible arrangements of the two clock hands. However, as I learnt putting up our clock, not all arrangements of clock hands correspond to an actual time. This means that only some points on the torus correspond to an actual time but how can we identify these points?

Keeping with our previous convention, let’s use the blue circle to represent the position of the minute hand and the red circle to represent the position of the hour hand. This means that the point where the two circles meet corresponds to 12 o’clock.

There are eleven other points on the red line that correspond to the other times when the minute hand is at 12. That is, there’s a point for 1 o’clock, 2 o’clock, 3 o’clock and so on. Once we add in those points, our torus looks like this:

Finally, we have to join these points together. We know that when the hour hand moves from 12 to 1, the minute hand does one full rotation. This means that we have to join the black points by making one full rotation in the direction of the blue circle. The result is the black curve below that snakes around the torus.

The picture above should explain most of this blog’s title – “a clock is a one-dimensional subgroup of the torus”. We now know what the torus is and why certain points on the torus correspond to positions of the hands on a clock. We can see that these “clock points” correspond to a line that snakes around the torus. While the torus is a surface and hence two dimensional, the line is one-dimensional. The last missing part is the word “subgroup”. I won’t go into the details here but the torus has some extra structure that makes it something called a group. Our map from the clock to the torus interacts nicely with this structure and this makes the black line a “subgroup”.

### Another perspective

While the above pictures of the torus are pretty, they can be a bit hard to understand and hard to draw. Mathematicians have another perspective of the torus that is often easier to work with. Imagine that you have a square sheet of rubber. If you rolled up the rubber and joined a pair of opposite sides, you would get a rubber tube. If you then bent the tube to join the opposite sides again, you would get a torus! The gif bellow illustrates this idea

This means that we can simply view the torus as a square. We just have to remember that the opposite sides of the squares have been glued together. So like a game of snake on a phone, if you leave the top of the square, you come out at the same place on the bottom of the square. If we use this idea to redraw our torus it now looks like this:

As before we can draw in the other points when the minute hand is at 12. These points correspond to 1 o’clock, 2 o’clock, 3 o’clock…

Finally we can draw in all the other times on the clock. This is the result:

One nice thing about this picture is that it can help us answer a classic riddle. In a 12-hour cycle, how many times are the minute and hour hands on top of each other? We can answer this riddle by adding a second line to the above square. The bottom-left to top-right diagonal is the collection of all hand positions where the two hands are on top of each other. Let’s add that line in green and add the points where this new line intersects the black line.

The points where the green and black lines intersect are hand positions where the clock hands are directly on top of each other and which correspond to actual times. Thus we can count that there are exactly 11 times when the hands are on top of each other in a 12-hour cycle. It might look like there are 12 such times but we have to remember that the corners of the square are all the same point on the torus.

So far I have ignored the second hand on the clock. If we included the second hand, we would have three points on three different circles. The corresponding geometric object is a 3-dimensional torus. The 3-dimensional torus is what you get when you take a cube and glue together the three pairs of opposite faces (don’t worry if you have trouble visualising such a shape!).

The points on the 3-dimensional torus which correspond to actual times will again be a line that wraps around the 3-dimensional torus. You could use this line to find out how many times the three hands are all on top of each other! Let me know if you work it out.

I hope that if you’re ever asked to define a clock, you’d at least consider saying “a clock is a one-dimensional subgroup of the torus” and you could even tell them which subgroup!

## Sports climbing at the 2020 Olympics

Last year sports climbing made its Olympic debut. My partner Claire is an avid climber and we watched some of the competition together. The method of ranking the competitors had some surprising consequences which are the subject of this blog.

In the finals, the climbers had to compete in three disciplines – speed climbing, bouldering and lead climbing. Each climber was given a rank for each discipline. For speed climbing, the rank was given based on how quickly the climbers scaled a set route. For bouldering and lead climbing, the rank was based on how far they climbed on routes none of them had seen before.

To get a final score, the ranks from the three disciplines are multiplied together. The best climber is the one with the lowest final score. For example, if a climber was 5th in speed climbing, 2nd in bouldering and 3rd in lead climbing, then they would have a final score of $5\times 2 \times 3 = 30$. A climber who came 8th in speed, 1st in bouldering and 2nd in lead climbing would have a lower (and thus better) final score of $8 \times 1 \times 2 = 16$.

One thing that makes this scoring system interesting is that your final score is very dependent on how well your competitors perform. This was most evident during Jakob Schubert’s lead climb in the men’s final.

Lead climbing is the final discipline and Schubert was the last climber. This meant that ranks for speed climbing and bouldering were already decided. However, the final rankings of the climbers fluctuated hugely as Schubert scaled the 15 metre lead wall. This was because if a climber had done really well in the boulder and speed rounds, being overtaken by Schubert didn’t increase their final score by that much. However, if a climber did poorly in the previous rounds, then being overtaken by Schubert meant their score increased by a lot and they plummeted down the rankings. This can be visualised in the following plot:

Along the x-axis we have how far up the lead wall Schubert climbed (measured by the “hold” he had reached). On the y-axis we have the finalists’ ranking at different stages of Schubert’s climb. We can see that as Schubert climbed and overtook the other climbers the whole ranking fluctuated. Here is a similar plot which also shows when Schubert overtook each competitor on the lead wall:

The volatility of the rankings can really be seen by following Adam Ondra’s ranking (in purple). When Schubert started his climb, Ondra was ranked second. After Schubert passed Albert Gines Lopez, Ondra was ranked first. But then Schubert passed Ondra, Ondra finished the event in 6th place and Gines Lopez came first. If you go to the 4 hour and 27 minutes mark here you can watch Schubert’s climb and hear the commentator explain how both Gines Lopez and Ondra are in the running to win gold.

Similar things happened in the women’s finals. Janja Garnbret was the last lead climber in the women’s final. Here is the same plot which shows the climbers’ final rankings and lead ranking.

Garnbret was a favourite at the Olympics and had come fifth in the speed climbing and first in bouldering. This meant that as long as she didn’t come last in the lead climbing she would at least take home silver and otherwise she’ll get the gold. Garnbret ended up coming first in the lead climbing and we can see that as she overtook the last few climbers, their ranking fluctuated wildly.

Here is one more plot which shows each competitor’s final score at different points of Garnbret’s climb.

In the plot you can really see that, depending on how they performed in the previous two events, each climber’s score changed by a different amount once they were overtaken. It also shows how Garnbret is just so ahead of the competition – especially when you compare the men’s and women’s finals. Here is the same plot for the men. You can see that the men’s final scores were a lot closer together.

Before I end this post, I would like to make one comment about the culture of sport climbing. In this post I wanted to highlight how tumultuous and complicated the sport climbing rankings were but if you watched the athletes you’d have no idea the stakes were so high. The climbers celebrate their personal bests as if no one was watching, they trade ideas on how to tackle the lead wall and the day after the final, they returned to the bouldering wall to try the routes together. Sports climbing is such a friendly and welcoming sport and I would hate for my analysis to give anyone the wrong idea.

## An art and maths collaboration

Over the course of the past year I have had the pleasure to work with the artist Sanne Carroll on her honours project at the Australian National University. I was one of two mathematics students that collaborated with Sanne. Over the course of the project Sanne drew patterns and would ask Ciaran and I to recreate them using some mathematical or algorithmic ideas. You can see the final version of project here: https://www.sannecarroll.com/ (best viewed on a computer).

I always loved the patterns Sanne drew and the final project is so well put together. Sanne does a great job of incorporating her drawings, the mathematical descriptions and the communication between her, Ciaran and me. Her website building skills also far surpass anything I’ve done on this blog!

It was also a lot of fun to work with Sanne. Hearing about her patterns and talking about maths with her was always fun. I also learnt a few things about GeoGebra which made the animations in my previous post a lot quicker to make. Sanne has told me that she’ll be starting a PhD soon and I’m looking forward to any future collaborations that might arise.

## Cayley Graphs and Cakes

Over the past month I have been studying at the AMSI Summer School at La Trobe University in Melbourne. Eight courses are offered at the AMSI Summer School and I took the one on geometric group theory. Geometric group theory is also the subject of my honours thesis and a great area of mathematics. I previously wrote about some ideas from geometric group theory here.

One of the main ideas in geometric group theory is to take a finitely generated group and turn it into a geometric object by constructing a Cayley graph (or in Germany, a Dehn gruppenbild or Dehn group picture).

If $G$ is a group with a generating set $A$, then the Cayley graph of $(G,A)$ has the elements of the group as vertices and for each group element $g \in G$ and generator $a \in A$, there is an edge from $g$ to $ga$.

Cayley graphs can be very pretty and geometrical interesting. In the final week of the course, our homework was to creatively make a Cayley graph. Here’s a sample of the Cayley graphs we made.

With a friend I baked a cake and decorated it with the Cayley graph of the group $C_2 * C_3 \cong \langle a ,b \mid a^2,b^3 \rangle$ with respect to the generating set $\{a,b\}$. We were really happy with how it looked and tasted and are proud to say that the whole thing got eaten at a BBQ for the summer school students.

Staying with the food theme, a friend use grapes and skewers to make their Cayley graph. It’s a graph of the discrete Heisenberg group $\langle a,b,c \mid ac=ca, bc=cb, ba=abc \rangle$. I was amazed at the structural integrity of the grapes. There’s a video about this Cayley graph that you can watch here (alternatively it’s the first result if you seach “Heisenberg group grapes”).

This Cayley graph is made out of paper and shows how picking a different generating set $A$ can change the appearance of the Cayley graph. It is a Cayley graph of $\mathbb{Z}^2$. Normally this Cayley graph is drawn with respect to the generating set $\{(1,0),(0,1)\}$ and Cayley graph looks like the square lattice on the right. The Cayley graph on the left was made with respect to the generating set $\{(1,0),(0,1),(1,1)\}$ and the result is a triangular tiling of the plane. Note that while the two Cayley graphs of $\mathbb{Z}^2$ look quite different, they have some important shared properties. In particular, both of them are two dimensional and flat. (The second image is taken from here)

Someone also made a Cayley graph of the Baumslag Solitar group $\text{BS}(1,2) \cong \langle a,t \mid tat^{-1}=a^2 \rangle$ with respect to $\{a,t\}$. This was a group that came up frequently in the course as it can be used to construct a number of surprising counter examples.

Finally, my favourite Cayley graph of the day was the incredibly pretty Cayley graph of the Coxeter group

$\langle a,b,c,d \mid a^2, b^2, c^2, d^2, (ab)^3, (bc)^3, (ac)^2, (ad)^2, (cd)^2 \rangle$

with respect to the set $\{a,b,c,d\}$.

I’d like to thank both AMSI and La Trobe for putting on the Summer School and the three geometric group theory lecturers Alejandra Garrido, Murray Elder and Lawrence Reeves for teaching a great course. A huge thanks to the other students for making some great Cayley graphs and letting me share some of them.

Earlier this week I spoke at Maths and Computer Science in the Pub. The event was hosted by Phil Dooley and sponsored by the Mathematical Sciences Institute. I had a great time talking and hearing from the other presenters. Below is a photo of me presenting and a transcript of my talk.

Right now the number of people with an odd number of Facebook friends is an even number. In fact at any point in time the number of people with an odd number of Facebook friends is an even number. Why is this true? We could check this claim by counting exactly how many people have an odd number of friends but this is much too complicated. An easy answer can be founded by using a common mathematical approach called counting in different ways.

For each Facebook user, imagine looking at how many Facebook friends they have and add up all those numbers. We can write this quantity as an intimidating sum:

$\sum\limits_{\text{Facebook users}} \text{number of friends}.$

Another way of counting the same thing is to look at the number of friendships and multiply that number by 2. This is because each friendship is between two people. When we counted the first number we double counted every friendship. We can write this in an equation:

$\sum\limits_{\text{Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships}).$

The important thing that this tells us is that the sum on the left is an even number. We can split this sum into two sums. We could first add up how many Facebook friends all of the people who have an even number of friends. Then we can add up how many Facebook friends all of the people with an odd number of friends have. These two sums added together gives us the original sum which is an even number.

$\sum\limits_{\text{even Facebook users}} \text{number of friends}+\sum\limits_{\text{odd Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships}).$

In the first sum, all of the people have an even number of Facebook friends so all of the numbers are even. Thus the first sum is an even number. If we subtract this first sum from both sides, then we can see that the second sum is equal to an even number minus an even number. Hence the second sum in an even number.

$\sum\limits_{\text{odd Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships})-\sum\limits_{\text{even Facebook users}} \text{number of friends}.$

In the second sum, all of the people have an odd number of friends. Thus all of the numbers must be odd. So we are adding up a bunch of odd numbers and getting an even number. The only way this is possible is if we have an even number of terms. That is if we have an even number of people with an odd number of Facebook friends.

So there you have it, the number of people with an odd number of Facebook friends is an even number. You can be sure of this fact thanks to a mathematical proof but why should we stop there? One thing mathematicians love to do is generalise their results. The only things we needed to know about Facebook was that there are a bunch of people with Facebook accounts and there are some pairs of people whose Facebook accounts are linked by being friends. If we take this abstract view of Facebook we arrive at the definition of a graph. A graph consists of a set of points called vertices and a set of connections between some pairs of vertices called edges.

Facebook is an example of a graph but there are many other examples. For each graph we get a fact analogous to the number of people with an odd number of Facebook friends being even. We learn that the number of vertices with an odd number of neighbours is an even number. The exact same proof works if at every point we replace “person” with “vertex”, “Facebook friend” with “neighbour” and “friendship” with “edge”.

For example if there’s a big party, we can make a graph where the vertices are people who attended the party and there is an edge between two people if they hugged at the party. Then the theorem tells us that the number of people who hugged an odd number of people is an even number.

Unfortunately not everything is a graph. In particular Instagram and Twitter are not graphs. On these social media platforms, you can follow someone without them following you back. This breaks the proof I gave before and it’s not always true that the number of people with an odd number of Instagram followers is always even. In fact it fluctuates between even and odd whenever someone follows or unfollows someone else.

So if someone asks you what the difference between Facebook and Instagram is, now you know. On Facebook the number of people with an odd number of Facebook friends is always even.