Blog Feed

Facebook and Graph Theory

Earlier this week I spoke at Maths and Computer Science in the Pub. The event was hosted by Phil Dooley and sponsored by the Mathematical Sciences Institute. I had a great time talking and hearing from the other presenters. Below is a photo of me presenting and a transcript of my talk.

Right now the number of people with an odd number of Facebook friends is an even number. In fact at any point in time the number of people with an odd number of Facebook friends is an even number. Why is this true? We could check this claim by counting exactly how many people have an odd number of friends but this is much too complicated. An easy answer can be founded by using a common mathematical approach called counting in different ways.

For each Facebook user, imagine looking at how many Facebook friends they have and add up all those numbers. We can write this quantity as an intimidating sum:

\sum\limits_{\text{Facebook users}} \text{number of friends}.

Another way of counting the same thing is to look at the number of friendships and multiply that number by 2. This is because each friendship is between two people. When we counted the first number we double counted every friendship. We can write this in an equation:

\sum\limits_{\text{Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships}).

The important thing that this tells us is that the sum on the left is an even number. We can split this sum into two sums. We could first add up how many Facebook friends all of the people who have an even number of friends. Then we can add up how many Facebook friends all of the people with an odd number of friends have. These two sums added together gives us the original sum which is an even number.

\sum\limits_{\text{even Facebook users}} \text{number of friends}+\sum\limits_{\text{odd Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships}).

In the first sum, all of the people have an even number of Facebook friends so all of the numbers are even. Thus the first sum is an even number. If we subtract this first sum from both sides, then we can see that the second sum is equal to an even number minus an even number. Hence the second sum in an even number.

\sum\limits_{\text{odd Facebook users}} \text{number of friends} = 2\cdot(\text{number of friendships})-\sum\limits_{\text{even Facebook users}} \text{number of friends}.

In the second sum, all of the people have an odd number of friends. Thus all of the numbers must be odd. So we are adding up a bunch of odd numbers and getting an even number. The only way this is possible is if we have an even number of terms. That is if we have an even number of people with an odd number of Facebook friends.

So there you have it, the number of people with an odd number of Facebook friends is an even number. You can be sure of this fact thanks to a mathematical proof but why should we stop there? One thing mathematicians love to do is generalise their results. The only things we needed to know about Facebook was that there are a bunch of people with Facebook accounts and there are some pairs of people whose Facebook accounts are linked by being friends. If we take this abstract view of Facebook we arrive at the definition of a graph. A graph consists of a set of points called vertices and a set of connections between some pairs of vertices called edges.

Facebook is an example of a graph but there are many other examples. For each graph we get a fact analogous to the number of people with an odd number of Facebook friends being even. We learn that the number of vertices with an odd number of neighbours is an even number. The exact same proof works if at every point we replace “person” with “vertex”, “Facebook friend” with “neighbour” and “friendship” with “edge”.

For example if there’s a big party, we can make a graph where the vertices are people who attended the party and there is an edge between two people if they hugged at the party. Then the theorem tells us that the number of people who hugged an odd number of people is an even number.

Unfortunately not everything is a graph. In particular Instagram and Twitter are not graphs. On these social media platforms, you can follow someone without them following you back. This breaks the proof I gave before and it’s not always true that the number of people with an odd number of Instagram followers is always even. In fact it fluctuates between even and odd whenever someone follows or unfollows someone else.

So if someone asks you what the difference between Facebook and Instagram is, now you know. On Facebook the number of people with an odd number of Facebook friends is always even.

A minimal counterexample in probability theory

Last semester I tutored the course Probability Modelling with Applications. In this course the main objects of study are probability spaces. A probability space is a triple (\Omega, \mathcal{F}, \mathbb{P}) where:

  1. \Omega is a set.
  2. \mathcal{F} is a \sigma-algebra on \Omega. That is \mathcal{F} is a collection of subsets of \Omega such that \Omega \in \mathcal{F} and \mathcal{F} is closed under set complements and countable unions. The element of \mathcal{F} are called events and they are precisely the subsets of \Omega that we can assign probabilities to. We will denote the power set of \Omega by 2^\Omega and hence \mathcal{F} \subseteq 2^\Omega.
  3. \mathbb{P} is a probability measure. That is it is a function \mathbb{P}: \mathcal{F} \rightarrow [0,1] such that \mathbb{P}(\Omega)=1 and for all countable collections \{A_i\}_{i=1}^\infty \subseteq \mathcal{F} of mutually disjoint subsets we have that \mathbb{P} \left(\bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty \mathbb{P}(A_i) .

It’s common for students to find probability spaces, and in particular \sigma-algebras, confusing. Unfortunately Vitalli showed that \sigma-algebras can’t be avoided if we want to study probability spaces such as \mathbb{R} or an infinite number of coin tosses. One of the main reasons why \sigma-algebras can be so confusing is that it can be very hard to give concrete descriptions of all the elements of a \sigma-algebra.

We often have a collection \mathcal{G} of subsets of \Omega that we are interested in but this collection fails to be a \sigma-algebra. For example, we might have \Omega = \mathbb{R}^n and \mathcal{G} is the collection of open subsets. In this situation we take our \sigma-algebra \mathcal{F} to be \sigma(\mathcal{G}) which is the smallest \sigma-algebra containing \mathcal{G}. That is

\sigma(\mathcal{G}) = \bigcap \mathcal{F}'

where the above intersection is taken over all \sigma-algebras \mathcal{F}' that contain \mathcal{G}. In this setting we will say that \mathcal{G} generates \sigma(\mathcal{G}). When we have such a collection of generators, we might have an idea for what probability we would like to assign to sets in \mathcal{G}. That is we have a function \mathbb{P}_0 : \mathcal{G} \rightarrow [0,1] and we want to extend this function to create a probability measure \mathbb{P} : \sigma(\mathcal{G}) \rightarrow [0,1]. A famous theorem due to Caratheodory shows that we can do this in many cases.

An interesting question is whether the extension \mathbb{P} is unique. That is does there exists a probability measure \mathbb{P}' on \sigma(\mathcal{G}) such that \mathbb{P} \neq \mathbb{P}' but \mathbb{P}_{\mid \mathcal{G}} = \mathbb{P}_{\mid \mathcal{G}}'? The following theorem gives a criterion that guarantees no such \mathbb{P}' exists.

Theorem: Let \Omega be a set and let \mathcal{G} be a collection of subsets of \Omega that is closed under finite intersections. Then if \mathbb{P},\mathbb{P}' : \sigma(\mathcal{G}) \rightarrow [0,1] are two probability measures such that \mathbb{P}_{\mid \mathcal{G}} = \mathbb{P}'_{\mid \mathcal{G}}, then \mathbb{P} = \mathbb{P}'.

The above theorem is very useful for two reasons. Firstly it can be combined with Caratheodory’s extension theorem to uniquely define probability measures on a \sigma-algebra by specifying the values on a collection of simple subsets \mathcal{G}. Secondly if we ever want to show that two probability measures are equal, the above theorem tells us we can reduce the problem to checking equality on the simpler subsets in \mathcal{G}.

The condition that \mathcal{G} must be closed under finite intersections is somewhat intuitive. Suppose we had A,B \in \mathcal{G} but A \cap B \notin \mathcal{G}. We will however have A \cap B \in \sigma(\mathcal{G}) and thus we might be able to find two probability measure \mathbb{P},\mathbb{P}' : \sigma(\mathcal{G}) \rightarrow [0,1] such that \mathbb{P}(A) = \mathbb{P}'(A) and \mathbb{P}'(B)=\mathbb{P}'(B) but \mathbb{P}(A \cap B) \neq \mathbb{P}'(A \cap B). The following counterexample shows that this intuition is indeed well-founded.

When looking for examples and counterexamples, it’s good to try to keep things as simple as possible. With that in mind we will try to find a counterexample when \Omega is finite set with as few elements as possible and \sigma(\mathcal{G}) is equal to the powerset of \Omega. In this setting, a probability measure \mathbb{P}: \sigma(\mathcal{G}) \rightarrow [0,1] can be defined by specifying the values \mathbb{P}(\{\omega\}) for each \omega \in \Omega.

We will now try to find a counterexample when \Omega is as small as possible. Unfortunately we won’t be able find a counterexample when \Omega only contains one or two elements. This is because we want to find A,B \subseteq \Omega such that A \cap B is not equal to A,B or \emptyset.

Thus we will start out search with a three element set \Omega = \{a,b,c\}. Up to relabelling the elements of \Omega, the only interesting choice we have for \mathcal{G} is \{  \{a,b\} , \{b,c\} \}. This has a chance of working since \mathcal{G} is not closed under intersection. However any probability measure \mathbb{P} on \sigma(\mathcal{G}) = 2^{\{a,b,c\}} must satisfy the equations

  1. \mathbb{P}(\{a\})+\mathbb{P}(\{b\})+\mathbb{P}(\{c\}) = 1,
  2. \mathbb{P}(\{a\})+\mathbb{P} (\{b\}) = \mathbb{P}(\{a,b\}),
  3. \mathbb{P}(\{b\})+\mathbb{P}(\{c\}) = \mathbb{P}(\{b,c\}).

Thus \mathbb{P}(\{a\}) = 1- \mathbb{P}(\{b,c\}), \mathbb{P}(\{c\}) = 1-\mathbb{P}(\{a,b\}) and \mathbb{P}(\{b\})=\mathbb{P}(\{a,b\})+\mathbb{P}(\{b,c\})-1. Thus \mathbb{P} is determined by its values on \{a,b\} and \{b,c\}.

However, a four element set \{a,b,c,d\} is sufficient for our counter example! We can let \mathcal{G} = \{\{a,b\},\{b,c\}\}. Then \sigma(\mathcal{G})=2^{\{a,b,c,d\}} and we can define \mathbb{P} , \mathbb{P}' : \sigma (\mathcal{G}) \rightarrow [0,1] by

  • \mathbb{P}(\{a\}) = 0, \mathbb{P}(\{b\})=0.5, \mathbb{P}(\{c\})=0 and \mathbb{P}(\{d\})=0.5.
  • \mathbb{P}'(\{a\})=0.5, \mathbb{P}(\{b\})=0, \mathbb{P}(\{c\})=0.5 and \mathbb{P}(\{d\})=0.

Clearly \mathbb{P} \neq \mathbb{P}' however \mathbb{P}(\{a,b\})=\mathbb{P}'(\{a,b\})=0.5 and \mathbb{P}(\{b,c\})=\mathbb{P}'(\{b,c\})=0.5. Thus we have our counterexample! In general for any \lambda \in [0,1) we can define the probability measure \mathbb{P}_\lambda = \lambda\mathbb{P}+(1-\lambda)\mathbb{P}'. The measure \mathbb{P}_\lambda is not equal to \mathbb{P} but agrees with \mathbb{P} on \mathcal{G}. In general, if we have two probability measures that agree on \mathcal{G} but not on \sigma(\mathcal{G}) then we can produce uncountably many such measures by taking convex combinations as done above.

[Link Post] Complexity Penalties in Statistical Machine Learning

Earlier in the year I wrote a post on Less Wrong about some of the material I learnt at the 2019 AMSI Summer School. You can read it here. On a related note, applications are open for the 2020 AMSI Summer School at La Trobe University. I highly recommend attending!

Combing groups

Two weeks ago I gave talk titled “Two Combings of \mathbb{Z}^2“. The talk was about some material I have been discussing lately with my honours supervisor. The talk went well and I thought it would be worth sharing a written version of what I said.

Geometric Group Theory

Combings are a tool that gets used in a branch of mathematics called geometric group theory. Geometric group theory is a relatively new area of mathematics and is only around 30 years old. The main idea behind geometric group theory is to use tools and ideas from geometry and low dimensional topology to study and understand groups. It turns out that some of the simplest questions one can ask about groups have interesting geometric answers. For instance, the Dehn function of a group gives a natural geometric answer to the solvability of the word problem.


Before we can define what a combing is we’ll need to set up some notation. If A is a set then we will write A^* for the set of words written using elements of A and inverses of elements of A. For instance if A = \{a,b\}, then A^* = \{\varepsilon, a,b,a^{-1},b^{-1},aa=a^2,abb^{-1}a^{-3},\ldots\} (here \varepsilon refers to the empty word). If w is a word in A^*, we will write l(w) for the length of w. Thus l(\varepsilon)=0, l(a)=1, l(abb^{-1}a^{-3})=6 and so on.

If G is a group and A is a subset of G, then we have a natural map \pi : A^* \rightarrow G given by:

\pi(a_1^{\pm 1}a_2^{\pm 1}\ldots a_n^{\pm 1}) = a_1^{\pm 1}\cdot a_2^{\pm 1} \cdot \ldots \cdot a_n^{\pm 1}.

We will say that A generates G if the above map is surjective. In this case we will write \overline{w} for \pi(w) when w is a word in A^*.

The Word Metric

The geometry in “geometric group theory” often arises when studying how a group acts on different geometric spaces. A group always acts on itself by left multiplication. The following definition adds a geometric aspect to this action. If G is a group with generators A, then the word metric on G with respect to A is the function d_G : G \times G \rightarrow \mathbb{R} given by

d_G(g,h) = \min \{ l(w) \mid w \in A^*, \overline{w} = g^{-1}h \}.

That, is the distance between two group elements g,h \in G is the length of the shortest word in A^* we can use to represent g^{-1}h. Equivalently the distance between g and h is the length of the shortest word we have to append to g to produce h. This metric is invariant under left-multiplication by G (ie d_G(g\cdot h,g\cdot h') =d_G(h,h') for all g,h,h' \in G). Thus G acts on (G,d_G) by isometries.

Words are Paths

Now that we are viewing the group G as a geometric space, we can also change how we think of words w \in A^*. Such a word can be thought of as discrete path in G. That is we can think of w as a function from \mathbb{N} to G. This way of thinking of w as a discrete path is best illuminated with an example. Suppose we have the word w = ab^2a^{-1}b, then

w(0) = e,
w(1) = \overline{a}
w(2) = \overline{ab}
w(3) = \overline{ab^2}
w(4) = \overline{ab^2a^{-1}}
w(5) = \overline{ab^2a^{-1}b}
w(t) = \overline{ab^2a^{-1}b}, t \ge 5.

Thus the path w : \mathbb{N} \rightarrow G is given by taking the first t letters of w and mapping this word to the group element it represents. With this interpretation of word in A^* in mind we can now define combings.


Let G be a group with a finite set of generators A. Then a combing of G with respect to A is a function \sigma : G \rightarrow A^* such that

  1. For all g \in G, \overline{\sigma_g} = g (we will write \sigma_g for \sigma(g)).
  2. There exists k >0 such that for all g, h \in G with g \cdot \overline{a} = h for some a \in A, we have that d_G(\sigma_g(t),\sigma_h(t)) \le k for all t \in \mathbb{N}.

The first condition says that we can think of \sigma as a way of picking a normal form \sigma_g \in A^* for each g \in G. The second condition is a bit more involved. It states that if the group elements g, h \in G are distance 1 from each other in the word metric, then the paths \sigma_g,\sigma_h  :  \mathbb{N} \rightarrow G are within distance k of each other at any point in time.

An Example

Not all groups can be given a combing. Indeed if we have a combing of G, then the word problem in G is solvable and the Dehn function of G is at most exponential. One group that does admit a combing is \mathbb{Z}^2 = \{(m,n) \mid m,n \in \mathbb{Z}\}. This group is generated by A = \{(1,0),(0,1)\} = \{\beta,\gamma\} and one combing of \mathbb{Z}^2 with respect to this generating set is

\sigma_{(m,n)} = \beta^m\gamma^n.

The first condition of being a combing is clearly satisfied and the following picture shows that the second condition can be satisfied with k = 2.

A Non-Example

The discrete Heisenberg group, H_3, can be given by the following presentation

H_3 = \langle \alpha,\beta,\gamma \mid \alpha\gamma = \gamma\alpha, \beta\gamma = \gamma\beta, \beta\alpha = \alpha\beta\gamma \rangle.

That is, the group H_3 has three generators \alpha,\beta and \gamma. The generator \gamma commutes with both \alpha and \beta. The generators \alpha and \beta almost commute but don’t quite as seen in the relation \beta\alpha = \alpha\beta\gamma.

Any h \in H_3 can be represented uniquely as \sigma_h = \alpha^p\beta^m\gamma^n for p,m,n \in \mathbb{Z}. To see why such a representation exists it’s best to consider an example. Suppose that h = \overline{\gamma\beta\alpha\gamma\alpha\beta\gamma}. Then we can use the fact that \gamma commutes with \alpha and \beta to push all \gamma‘s to the right and we get that h = \overline{\beta\alpha\alpha\beta\gamma^3}. We can then apply the third relation to switch the order of \alpha and \beta on the right. This gives us that that h = \overline{\alpha\beta\gamma\alpha\beta\gamma^3}=\overline{\alpha\beta\alpha\beta\gamma^4}. If we apply this relation once more we get that h = \overline{\alpha^2\beta^2\gamma^5} and thus \sigma_h = \alpha^2\beta^2\gamma^5. The procedure used to write h in the form \alpha^p\beta^m\gamma^n can be generalized to any word written using \alpha^{\pm 1}, \beta^{\pm 1}, \gamma^{\pm 1}.

The fact the such a representation is unique (that is if \overline{\alpha^p\beta^m\gamma^n} = \overline{\alpha^{p'}\beta^{m'}\gamma^{n'}}, then (p,m,n) = (p',m',n')) is harder to justify but can be proved by defining an action of H_3 on \mathbb{Z}^3. Thus we can define a function \sigma : H_3 \rightarrow \{\alpha,\beta,\gamma\}^* by setting \sigma_h to be the unique word of the form \alpha^p \beta^m\gamma^n that represents h. This map satisfies the first condition of being a combing and has many nice properties. These include that it is easy to check whether or not a word in \{\alpha,\beta,\gamma\}^* is equal to \sigma_h for some h \in H_3 and there are fast algorithms for putting a word in \{\alpha,\beta,\gamma\}^* into its normal form. Unfortunately this map fails to be a combing.

The reason why \sigma : H_3 \rightarrow \{\alpha,\beta,\gamma\}^* fails to be a combing can be seen back when we turned \overline{  \gamma\beta\alpha\gamma\alpha\beta\gamma } into \overline{\alpha^2\beta^2\gamma^5} . To move \alpha‘s on the right to the left we had to move past \beta‘s and produce \gamma‘s in the process. More concretely fix m \in \mathbb{N} and let h = \overline{\beta^m} and g = \overline{\beta^m \alpha} = \overline{\alpha \beta^m\gamma^m}. We have \sigma_h = \beta^m and \sigma_g = \alpha \beta^m \gamma^m. The group elements g and h differ by a generator. Thus, if \sigma was a combing we should be able to uniformly bound d_{H_3}(\sigma_g(t),\sigma_h(t)) for all t \in \mathbb{N} and all m \in \mathbb{N}.

If we then let t = m+1, we can recall that

d_{H_3}(\sigma_g(t),\sigma_h(t)) = \min\{l(w) \mid w \in \{\alpha,\beta,\gamma\}^*, \overline{w} = \sigma_g(t)^{-1}\sigma_h(t)\}.

We have that \sigma_h(t) = \overline{\beta^m} and \sigma_g(t) = \overline{\alpha \beta^m} and thus

\sigma_g(t)^{-1}\sigma_h(t)  = (\overline{\alpha\beta^m})^{-1}\overline{\beta^m} = \overline{\beta^{-m}\alpha^{-1}\beta^m} = \overline{\alpha^{-1}\beta^{-m}\gamma^{m}\beta^m}=\overline{\alpha^{-1}\gamma^{m}}.

The group element \overline{\alpha^{-1}\gamma^{m}} cannot be represented as a shorter element of \{\alpha,\beta,\gamma\}^* and thus d_{H_3}(\sigma_g(t),\sigma_h(t)) = m+1 and the map \sigma is not a combing.

Can we comb the Heisenberg group?

This leaves us with a question, can we comb the group H_3? It turns out that we can but the answer actually lies in finding a better combing of \mathbb{Z}^2. This is because H_3 contains the subgroup \mathbb{Z}^2 \cong \langle \beta, \gamma \rangle \subseteq H_3. Rather than using the normal form \sigma_h = \alpha^p \beta^m \gamma^n, we will use \sigma'_h = \alpha^p \tau_{(m,n)} where \tau : \mathbb{Z}^2 \rightarrow \{\beta,\gamma \}^* is a combing of \mathbb{Z}^2 that is more symmetric. The word \tau_{(m,n)} is defined to be the sequence of m \beta‘s and n \gamma‘s that stays closest to the straight line in \mathbb{R}^2 that joins (0,0) to (n,p) (when we view \beta and \gamma as representing (1,0) and (0,1) respectively). Below is an illustration:

This new function isn’t quite a combing of H_3 but it is the next best thing! It is an asynchronous combing. An asynchronous combing is one where we again require that the paths \sigma_h,\sigma_g stay close to each other whenever h and g are close to each other. However we allow the paths \sigma_h and \sigma_g to travel at different speeds. Many of the results that can be proved for combable groups extend to asynchronously combable groups.


Hairdressing in Groups by Sarah Rees is a survey paper that includes lots examples of groups that do or do not admit combings. It also talks about the language complexity of a combing, something I didn’t have time to touch on in my talk.

Combings of Semidirect Products and 3-Manifold Groups by Martin Bridson contains a proof that H_3 is asynchornously combable. He actually proves the more general result that any group of the form \mathbb{Z}^n \rtimes \mathbb{Z}^m is asynchronously combable.

Thank you to my supervisor, Tony Licata, for suggesting I give my talk on combing H_3 and for all the support he has given me so far.