Sums of Squares – Part 1: Optimization

About a year ago I had the pleasure of having a number of conversations with Abhishek Bhardwaj about the area of mathematics his PhD is on. This pair of posts is based on the fond memories I have of these conversations. Part two can be found here.

Optimization and Summing Squares

Optimization is a huge area of both pure and applied mathematics. Many questions and problems can be reduced to finding the minimum or maximum value of some function. That is we have a function f : \mathbb{R}^n \to \mathbb{R} and we wish to find the number \lambda such that

\lambda = \min \{ f(x_1,\ldots, x_n) \mid (x_1,\ldots, x_n) \in \mathbb{R}^n \},


\lambda = \max \{ f(x_1,\ldots, x_n) \mid (x_1,\ldots, x_n) \in \mathbb{R}^n \}.

By considering the function -f we can reduce finding the maximum of f to finding the minimum of another function. Minimizing a function isn’t always easy. Even when the function f is a relative simple function such as a polynomial, it can be a very difficult problem.

A different way of thinking about the problem of minimizing the function f, is to instead think about the function f - \lambda were \lambda is a real number. The minimum of f is the largest value of \lambda such that the function f - \lambda is non-negative for all values x \in \mathbb{R}^n. Thus we have a reduced our optimization problem into the problem of working out for which values of \lambda is the function f-\lambda non-negative.

An example

Suppose our function f was the following function that takes in two variables

f(x,y) = 2x^2 + 4y^2-4xy-4x+4y+7.

To minimize this function we can look at the function g = f - \lambda which is equal to

g(x,y) = 2x^2 + 4y^2-4xy-4x+4y+7 - \lambda.

We wish to work for which values of \lambda is this function g non-negative. By doing the following algebraic manipulations we can rewrite g like so

g(x,y) = (x^2-2x+1)+(x^2+4y^2+1-4xy-2x+4y)+5-\lambda ,

which is in turn equal to

g(x,y) = (x-1)^2+(-x+2y+1)^2 +(5-\lambda)

Since (x-1)^2 and (-x+2y+1)^2 are both squares of real numbers, they are both non-negative. Furthermore at the point (x,y) = (1,0), we have that (x-1)^2+(-x+2y+1)^2=0. Thus g is non-negative if and only if (5-\lambda) \ge 0, that is if and only if \lambda \le 5. Thus we can conclude that the minimum of f is 5.

Sums of Squares

Note that the number 5-\lambda can be written as (\sqrt{5-\lambda})^2 if and only if \lambda \ge 5. Thus, in the above example we have that g is non-negative if and only if g can be written as a sum of squares. That is g is non-negative if and only if g can be written as \sum_{i=1}^n h_i^2 for some polynomials h_1, h_2, \ldots, h_n.

In general, if a polynomial can be written as a sum of squares of other polynomials, then the polynomial must be non-negative. A natural question to ask is if every non-negative polynomial can be written as a sum of squares. This would make our optimization problem a lot easier. To minimize f, we can check for which values of \lambda can f-\lambda be written as a sum of squares.

This question of whether every non-negative polynomial can be written as a sum of squares was answered in the negative by David Hilbert in 1888. However, Hilbert’s proof was non-constructive, it didn’t give an explicit example of such a polynomial. The proof only showed that such polynomials exist out there somewhere. It took nearly nearly 80 years for the first explicit example to be given by the mathematician Theodore Motzkin in 1967. We will take a look at Motzkin’s polynomial in the next blog post.

Finitely Additive Measures

I am again tutoring the course MATH3029. The content is largely the same but the name has changed from “Probability Modelling and Applications” to “Probability Theory and Applications” to better reflect the material taught. There was a good question on the first assignment that leads to some interesting mathematics.

The Question

The assignment question is as follows. Let \Omega be a set and let \mathcal{F} \subseteq \mathcal{P}(\Omega) be a \sigma-algebra on \Omega. Let \mathbb{P} : \mathcal{F} \to [0,\infty) be a function with the following properties

  1. \mathbb{P}(\Omega) = 1.
  2. For any finite sequence of pairwise disjoint sets (A_k)_{k=1}^n in \mathcal{F}, we have \mathbb{P}\left(\bigcup_{k=1}^n A_k \right) = \sum_{k=1}^n \mathbb{P}(A_k).
  3. If (B_n)_{n=1}^\infty is a sequence of sets in \mathcal{F} such that B_{n+1} \subseteq B_n for all n \in \mathbb{N} and \bigcap_{n=1}^\infty A_n = \emptyset, then, as n tends to infinity, \mathbb{P}(A_n) \to 0.

Students were then asked to show that the function \mathbb{P} is a probability measure on (\Omega, \mathcal{F}). This amounts to showing that \mathbb{P} is countably additive. That is if (A_k)_{k=1}^\infty is a sequence of pairwise disjoint sets, then \mathbb{P}\left(\cup_{k=1}^\infty A_k\right) = \sum_{k=1}^\infty \mathbb{P}(A_k). One way to do this is define B_n = \bigcup_{k=n+1}^\infty A_k. Since the sets (A_k)_{k=1}^\infty are pairwise disjoint, the sets (B_n)_{n=1}^\infty satisfy the assumptions of the third property of \mathbb{P}. Thus we can conclude that \mathbb{P}(B_n) \to 0 as n \to \infty.

We also have that for every n \in \mathbb{N} we have \bigcup_{k=1}^\infty A_k = \left(\bigcup_{k=1}^n A_k\right) \cup B_n. Thus by applying the second property of \mathbb{P} twice we get

\mathbb{P}\left(\bigcup_{k=1}^\infty A_k \right) = \mathbb{P}\left( \bigcup_{k=1}^n A_k\right) + \mathbb{P}(B_n) = \sum_{k=1}^n \mathbb{P}(A_k) + \mathbb{P}(B_n).

If we let n tend to infinity, then we get the desired result.

A Follow Up Question

A natural follow up question is whether all three of the assumptions in the question are necessary. It is particularly interesting to ask if there is an example of a function \mathbb{P} that satisfies the first two properties but is not a probability measure. It turns out the answer is yes but coming up with an example involves some serious mathematics.

Let \Omega be the set of natural numbers \mathbb{N} = \{1,2,3,\ldots\} and let \mathcal{F} be the power set of \mathbb{N}.

One way in which people talk about the size of a subset of natural numbers A \subseteq \mathbb{N} is to look at the proportion of elements in A and take a limit. That is we could define

\mathbb{P}(A) = \lim_{n \to \infty}\frac{|\{k \in A \mid k \le n \}|}{n}.

This function \mathbb{P} has some nice properties for instance if A is the set of all even numbers than \mathbb{P}(A) = 1/2. More generally if A is the set of all numbers divisible by k, then \mathbb{P}(A) = 1/k. The function \mathbb{P} gets used a lot. When people say that almost all natural numbers satisfy a property, they normally mean that if A is the subset of all numbers satisfying the property, then \mathbb{P}(A)=1.

However the function \mathbb{P} is not a probability measure. The function \mathbb{P} is finitely additive. To see this, let (A_i)_{i=1}^m be a finite collection of disjoint subsets of \mathbb{N} and let A = \bigcup_{i=1}^m A_i. Then for every natural number n,

\{k \in A \mid k \le n \} = \bigcup_{i=1}^m \{k \in A_i \mid k \le n\}.

Since the sets (A_i)_{i=1}^m are disjoint, the union on the right is a disjoint union. Thus we have

\frac{|\{k \in A \mid k \le n \}|}{n} = \sum_{i=1}^m \frac{|\{k \in A_i \mid k \le n \}|}{n}.

Taking limits on both sides gives \mathbb{P}(A)=\sum_{i=1}^m \mathbb{P}(A_i), as required. Furthermore, the function \mathbb{P} is not countably additive. For instance if we let A_i = \{i\} for each i \in \mathbb{N}. Then \bigcup_{i=1}^\infty A_i = \mathbb{N} and \mathbb{P}(\mathbb{N})=1. On the other hand \mathbb{P}(A_i)=0 for every i \in \mathbb{N} and hence \sum_{i=1}^\infty \mathbb{P}(A_i)=0\neq \mathbb{P}(\mathbb{N}).

Thus it would appear that we have an example of a finitely additive measure that is not countably additive. However there is a big problem with the above definition of \mathbb{P}. Namely the limit of \frac{|\{k \in A \mid k \le n \}|}{n} does not always exist. Consider the set A = \{3,4,9,10,11,12,13,14,15,16,33,\ldots\}, ie a number k \in \mathbb{N} is in A if and only if 2^{m} < k \le 2^{m+1} for some odd number m\ge 1. The idea with the set A is that it looks a little bit like this:

There are chunks of numbers that alternate between being in A and not being in A and as we move further along, these chunks double in size. Let a_n represent the sequence of numbers \frac{|\{k \in A \mid k \le n \}|}{n}. We can see that a_n increases while n is in a chunk that belongs to A and decreases when n is in a chunk not in A. More specifically if 2^{2m-1} < n \le 2^{2m}, then a_n is increasing but if 2^{2m} < n \le 2^{2m+1}, then a_n is decreasing.

At the turning points n = 2^{2m} or n = 2^{2m+1} we can calculate exactly what a_n is equal to. Note that

\{k \in A \mid k \le 2^{2m} \} = 2+8+32+\ldots+2\cdot 4^{m-1} = 2\cdot \frac{4^m-1}{4-1} = \frac{2}{3}(4^m-1).

Furthermore since there are no elements of A between 2^{2m} and 2^{2m+1} we have

\{k \in A \mid k \le 2^{2m+1}\} = \frac{2}{3}(4^m-1).

Thus we have

a_{2^m} = \frac{\frac{2}{3}(4^m-1)}{2^{2m}}=\frac{2}{3}\frac{4^m-1}{4^m} and a_{2m+1} = \frac{\frac{2}{3}(4^m-1)}{2^{2m+1}}=\frac{1}{3}\frac{4^m-1}{4^m}.

Hence the values a_n fluctuate between approaching \frac{1}{3} and \frac{2}{3}. Thus the limit of a_n does not exist and hence \mathbb{P} is not well-defined.

There is a work around using something called a Banach limit. Banach limits are a way of extending the notion of a limit from the space of convergent sequences to the space of bounded sequences. Banach limits aren’t uniquely defined and don’t have a formula describing them. Indeed to prove the existence of Banach limits one has to rely on non-constructive mathematics such as the Hanh-Banach extension theorem. So if we take for granted the existence of Banach limits we can define

\mathbb{P}(A) = L\left( \left(\frac{|\{k \in A \mid k \le n \}|}{n}\right)_{n=1}^\infty \right),

where L is now a Banach limit. This new definition of \mathbb{P} is defined on all subsets of \mathbb{N} and is an example of measure that is finitely additive but not countably additive. However the definition of \mathbb{P} is very non-constructive. Indeed there are models of ZF set theory where the Hanh-Banach theorem does not hold and we cannot prove the existence of Banach limits.

This begs the question of whether or not there exist constructible examples of measures that are finitely additive but not countably additive. A bit of Googling reveals that non-principal ultrafilters provide another way of defining non-countably additive measures. However the existence of a non-principal ultrafilter on \mathbb{N} is again equivalent to a weak form of the axiom of choice. Thus it seems that the existence of a non-countably additive measure may be inherently non-constructive. This discussion on Math Overflow goes into more detail.

Why is the fundamental theorem of arithmetic a theorem?

The fundamental theorem of arithmetic states that every natural number can be factorized uniquely as a product of prime numbers. The word “uniquely” here means unique up to rearranging. The theorem means that if you and I take the same number n and I write n = p_1p_2\ldots p_k and you write n = q_1q_2\ldots q_l where each p_i and q_i is a prime number, then in fact k=l and we wrote the same prime numbers (but maybe in a different order).

Most people happily accept this theorem as self evident and believe it without proof. Indeed some people take it to be so self evident they feel it doesn’t really deserve the name “theorem” – hence the title of this blog post. In this post I want to highlight two situations where an analogous theorem fails.

Situation One: The Even Numbers

Imagine a world where everything comes in twos. In this world nobody knows of the number one or indeed any odd number. Their counting numbers are the even numbers \mathbb{E} = \{2,4,6,8,\ldots\}. People in this world can add numbers and multiply numbers just like we can. They can even talk about divisibility, for example 2 divides 8 since 8 = 4\cdot 2. Note that things are already getting a bit strange in this world. Since there is no number one, numbers in this world do not divide themselves.

Once people can talk about divisibility, they can talk about prime numbers. A number is prime in this world if it is not divisible by any other number. For example 2 is prime but as we saw 8 is not prime. Surprisingly the number 6 is also prime in this world. This is because there are no two even numbers that multiply together to make 6.

If a number is not prime in this world, we can reduce it to a product of primes. This is because if n is not prime, then there are two number a and b such that n = ab. Since a and b are both smaller than n, we can apply the same argument and recursively write n as a product of primes.

Now we can ask whether or not the fundamental theorem of arthimetic holds in this world. Namely we want to know if their is a unique way to factorize each number in this world. To get an idea we can start with some small even numbers.

  • 2 is prime.
  • 4 = 2 \cdot 2 can be factorized uniquely.
  • 6 is prime.
  • 8  = 2\cdot 2 \cdot 2 can be factorized uniquely.
  • 10 is prime.
  • 12 = 2 \cdot 6 can be factorized uniquely.
  • 14 is prime.
  • 16 = 2\cdot 2 \cdot 2 \cdot 2 can be factorized uniquely.
  • 18 is prime.
  • 20 = 2 \cdot 10 can be factorized uniquely.

Thus it seems as though there might be some hope for this theorem. It at least holds for the first handful of numbers. Unfortunately we eventually get to 36 and we have:

36 = 2 \cdot 18 and 36 = 6 \cdot 6.

Thus there are two distinct ways of writing 36 as a product of primes in this world and thus the fundamental theorem of arithmetic does not hold.

Situtation Two: A Number Ring

While the first example is fun and interesting, it is somewhat artificial. We are unlikely to encounter a situation where we only have the even numbers. It is however common and natural for mathematicians to be lead into certain worlds called number rings. We will see one example here and see what an effect the fundamental theorem of arithmetic can have.

Consider wanting to solve the equation x^2+19=y^3 where x and y are both integers. One way to try to solve this is by rewriting the equation as (x+\sqrt{-19})(x-\sqrt{-19}) = y^3. With this rewriting we have left the familiar world of the whole numbers and entered the number ring \mathbb{Z}[\sqrt{-19}].

In \mathbb{Z}[\sqrt{-19}] all numbers have the form a + b \sqrt{-19}, where a and b are integers. Addition of two such numbers is defined like so

(a+b\sqrt{-19}) + (c + d \sqrt{-19}) = (a+c) + (b+d)\sqrt{-19}.

Multiplication is define by using the distributive law and the fact that \sqrt{-19}^2 = -19. Thus

(a+b\sqrt{-19})(c+d\sqrt{-19}) = (ac-19bd) + (ad+bc)\sqrt{-19}.

Since we have multiplication we can talk about when a number in \mathbb{Z}[\sqrt{-19}] divides another and hence define primes in \mathbb{Z}[\sqrt{-19}]. One can show that if x^2 + 19 = y^3, then x+\sqrt{-19} and x-\sqrt{-19} are coprime in \mathbb{Z}[\sqrt{-19}] (see the references at the end of this post).

This means that there are no primes in \mathbb{Z}[\sqrt{-19}] that divides both x+\sqrt{-19} and x-\sqrt{-19}. If we assume that the fundamental theorem of arthimetic holds in \mathbb{Z}[\sqrt{-19}], then this implies that x+\sqrt{-19} must itself be a cube. This is because (x+\sqrt{-19})(x-\sqrt{-19})=y^3 is a cube and if two coprime numbers multiply to be a cube, then both of those coprime numbers must be cubes.

Thus we can conclude that there are integers a and b such that x+\sqrt{-19} = (a+b\sqrt{-19})^3 . If we expand out this cube we can conclude that

x+\sqrt{-19} = (a^3-57ab^2)+(3a^2b-19b^3)\sqrt{-19}.

Thus in particular we have 1=3a^2b-19b^3=(3a^2-19b^2)b. This implies that b = \pm 1 and 3a^2-19b^2=\pm 1. Hence b^2=1 and 3a^2-19 = \pm 1. Now if 3a^2 -19 =-1, then a^2=6 – a contradiction. Similarly if 3a^2-19=1, then 3a^2=20 – another contradiction. Thus we can conclude there are no integer solutions to the equation x^2+19=y^3!

Unfortunately however, a bit of searching reveals that 18^2+19=343=7^3. Thus simply assuming that that the ring \mathbb{Z}[\sqrt{-19}] has unique factorization led us to incorrectly conclude that an equation had no solutions. The question of unique factorization in number rings such as \mathbb{Z}[\sqrt{-19}] is a subtle and important one. Some of the flawed proofs of Fermat’s Last Theorem incorrectly assume that certain number rings have unique factorization – like we did above.


The lecturer David Smyth showed us that the even integers do not have unique factorization during a lecture of the great course MATH2222.

The example of \mathbb{Z}[\sqrt{-19}] failing to have unique factorization and the consequences of this was shown in a lecture for a course on algebraic number theory by James Borger. In this class we followed the (freely available) textbook “Number Rings” by P. Stevenhagen. Problem 1.4 on page 8 is the example I used in this post. By viewing the textbook you can see a complete solution to the problem.

Cayley Graphs and Cakes

Over the past month I have been studying at the AMSI Summer School at La Trobe University in Melbourne. Eight courses are offered at the AMSI Summer School and I took the one on geometric group theory. Geometric group theory is also the subject of my honours thesis and a great area of mathematics. I previously wrote about some ideas from geometric group theory here.

One of the main ideas in geometric group theory is to take a finitely generated group and turn it into a geometric object by constructing a Cayley graph (or in Germany, a Dehn gruppenbild or Dehn group picture).

If G is a group with a generating set A, then the Cayley graph of (G,A) has the elements of the group as vertices and for each group element g \in G and generator a \in A, there is an edge from g to ga.

Cayley graphs can be very pretty and geometrical interesting. In the final week of the course, our homework was to creatively make a Cayley graph. Here’s a sample of the Cayley graphs we made.

With a friend I baked a cake and decorated it with the Cayley graph of the group C_2 * C_3 \cong \langle a ,b \mid a^2,b^3 \rangle with respect to the generating set \{a,b\}. We were really happy with how it looked and tasted and are proud to say that the whole thing got eaten at a BBQ for the summer school students.

Staying with the food theme, a friend use grapes and skewers to make their Cayley graph. It’s a graph of the discrete Heisenberg group \langle a,b,c \mid ac=ca, bc=cb, ba=abc \rangle. I was amazed at the structural integrity of the grapes. There’s a video about this Cayley graph that you can watch here (alternatively it’s the first result if you seach “Heisenberg group grapes”).

This Cayley graph is made out of paper and shows how picking a different generating set A can change the appearance of the Cayley graph. It is a Cayley graph of \mathbb{Z}^2. Normally this Cayley graph is drawn with respect to the generating set \{(1,0),(0,1)\} and Cayley graph looks like the square lattice on the right. The Cayley graph on the left was made with respect to the generating set \{(1,0),(0,1),(1,1)\} and the result is a triangular tiling of the plane. Note that while the two Cayley graphs of \mathbb{Z}^2 look quite different, they have some important shared properties. In particular, both of them are two dimensional and flat. (The second image is taken from here)

Someone also made a Cayley graph of the Baumslag Solitar group \text{BS}(1,2) \cong \langle a,t \mid tat^{-1}=a^2 \rangle with respect to \{a,t\}. This was a group that came up frequently in the course as it can be used to construct a number of surprising counter examples.

Finally, my favourite Cayley graph of the day was the incredibly pretty Cayley graph of the Coxeter group

\langle a,b,c,d \mid a^2, b^2, c^2, d^2, (ab)^3, (bc)^3, (ac)^2, (ad)^2, (cd)^2 \rangle

with respect to the set \{a,b,c,d\}.

I’d like to thank both AMSI and La Trobe for putting on the Summer School and the three geometric group theory lecturers Alejandra Garrido, Murray Elder and Lawrence Reeves for teaching a great course. A huge thanks to the other students for making some great Cayley graphs and letting me share some of them.

A Nifty Proof of the Cauchy-Schwarz Inequality

This blog post is entirely based on the start of this blog post by Terry Tao. I highly recommend reading the post. It gives an interesting insight into how Terry sometimes thinks about proving inequalities. He also gives a number of cool and more substantial examples.

The main idea in the blog post is that Terry likes to do “arbitrage” on an inequality to improve it. By starting with a weak inequality he exploits the symmetry of the environment he is working in to get better and better inequalities. He first illustrates this with a proof of the Cauchy-Schwarz inequality. The proof given is really nifty and much more memorable than previous proofs I’ve seen. I felt that just had to write it up and share it.

Let (V,\langle, \rangle) be an inner product space. The Cauchy-Schwarz inequality states that for all v,w \in V, |\langle v, w \rangle | \le \|v\| \|w\|. It’s an important result that leads, among other things, to a proof that \| \cdot \| satisfies the triangle inequality. There are many proofs of the Cauchy-Schwarz inequality but here is the one Terry presents.

Since \langle \cdot, \cdot \rangle is positive definite we have 0 \le \langle v-w,v-w\rangle. Now using the fact that \langle \cdot, \cdot \rangle is additive in each coordinate we have

0 \le \langle v,v \rangle -\langle v, w \rangle -\langle w,v\rangle+ \langle w,w \rangle.

Since \langle w,v\rangle = \overline{\langle v,w\rangle}, we can rearrange the above expression to get the inequality

\text{Re}(\langle v,w \rangle) \le \frac{1}{2}\left(\| v \|^2 + \|w\|^2\right).

And now it is time to exploit the symmetry of the above expression and turn this inequality into the Cauchy-Schwarz inequality. The above inequality is worse than the Cauchy Schwarz inequality for two reasons. Firstly, unless \langle v, w \rangle is a positive real number, the left hand side is smaller than |\langle v,w \rangle|. Secondly, unless \|v\|=\|w\|, the right hand side is larger than the quantity \|v\|\|w\| that we want. Indeed we want the geometric mean of \|v\|^2 and \|w\|^2 whereas we currently have the arithmetic mean on the right.

Note that the right hand side is invariant under the symmetry v \mapsto e^{i \theta} v for any real number \theta. Thus choose \theta to be the negative of the argument of \langle v,w \rangle. This turns the left hand side into |\langle v,w \rangle | while the right hand side remains invariant. Thus we have done our first bit of arbitrage and now have the improved inequality

| \langle v,w \rangle | \le \frac{1}{2}\left(\|v\|^2 + \|w\|^2\right)

We now turn our attention to the right hand side and observe that the left hand side is invariant under the map (v,w) \mapsto \left(c\cdot v, \frac{1}{c} \cdot w\right) for any c > 0. Thus by choosing c we can minimize the right hand side. A little bit of calculus shows that the best choice is c = \sqrt{\frac{\|w\|}{\|v\|}} (this is valid provided that v,w \neq 0, the case when v=0 or w=0 is easy since we would then have \langle v,w \rangle=0). If we substitute in this optimal value of c, the right hand side of the above inequality becomes

\frac{1}{2}\left( \left\| \sqrt{\frac{\|w\|}{\|v\|}}\cdot v \right \|^2 +\left \| \sqrt{\frac{\|v\|}{\|w\|}} \cdot w\right \|^2 \right)=\frac{1}{2}\left(\frac{\|w\|}{\|v\|}\|v\|^2+\frac{\|v\|}{\|w\|}\|w\|^2 \right) = \|v\|\|w\|.

Thus we have turned our weak starting inequality into the Cauchy-Schwarz inequality! Again I recommend reading Terry’s original post to see many more examples of this sort of arbitrage and symmetry exploitation.

The Outer Automorphism of S6

This blog post is about why the number 6 is my favourite number – because the symmetric group on six elements is the only finite permutation group that contains an outer automorphism. In the post we will first look at why there are no other symmetric groups with outer automorphisms and then we will define an outer automorphism of S_6.

Inner and Outer Automorphisms

If G is any group, then we can construct a new group \text{Aut}(G) consisting of all bijective group homomorphism \phi : G \to G. This set forms a group under function composition. There is a natural map from G to \text{Aut}(G). A group element g \in G is taken to the function \gamma_g : G \to G given by \gamma_g(h) = ghg^{-1}. Thus \gamma_g is conjugation by g.

The kernel of g \mapsto \gamma_g is called the center of G and is denoted by Z(G). That is Z(G) is the subgroup consisting of group elements that commute with every other group element. The image of this map is denoted by \text{Inn}(G) \subseteq \text{Aut}(G). An automorphism which is in \text{Inn}(G) is called an inner automorphism. One can check that the subgroup of inner automorphisms is a normal subgroup. Indeed if we have a group element g \in G and an automorphism \phi \in \text{Aut}(G), then \phi \circ \gamma_g \circ \phi^{-1} = \gamma_{\phi(g)}. Thus we can form the quotient \text{Aut}(G)/\text{Inn}(G) which we will call the outer automorphisms of G and denote by \text{Out}(G).

Thus the inner automorphisms of G are those which come from G in the form of conjugation. The outer automorphisms of G are equivalence classes of all the automorphisms of G. Two automorphisms \phi, \phi' are equivalent in \text{Out}(G) if there is an inner automorphism \gamma_g such that \gamma_g \circ \phi = \phi'.

The Symmetric Group

For a natural number n \in \mathbb{N}, S_n will denote the group of permutations of the set \{1,2,\ldots,n\}. Any permutation \sigma \in S_n can be written as product of disjoint cycles which we will call its cycle type. For instance, if n=5 and \sigma(1)=3, \sigma(2)=5, \sigma(3)=4, \sigma(4)=1 and \sigma(5)=2, then \sigma can be written in cycle notation as (1,3,4)(2,5). Cycle notation is very useful for describing conjugation in S_n. If we have two permutations \sigma, \tau \in S_n and we know the cycle notation for \sigma, then the cycle notation for \tau \sigma \tau^{-1} can be derived by applying \tau to every entry in the cycle notation of \sigma. For instance, with \sigma = (1,3,4)(2,5), we have

\tau(1,3,4)(2,5)\tau^{-1} = (\tau(1),\tau(3),\tau(4))(\tau(2),\tau(5)).

To see why this result about conjugation is true, note the following. If \sigma(i)=j for some i,j \in \{1,2,\ldots,n\}, then


Thus two permutations \sigma, \sigma' are conjugate to one another if and only if \sigma and \sigma' have the same cycle type (ie they have the same number of cycles of any given size). This implies that for n \ge 3, the centre of S_n must be trivial as for any non-identity \sigma \in S_n, there is at least one other element of the same cycle type. Hence the map \sigma \mapsto \gamma_\sigma from S_n to \text{Aut}(S_n) is injective for n \ge 3.

Counting Conjugacy Classes

Before constructing an outer automorphism for S_6, we will see why such a map is special. In particular we will learn that \text{Aut}(S_n) consists entirely of inner automorphism for n \neq 6. The argument is based off studying the size of different conjugacy classes in S_n. This is important because if \phi is an automorphism of S_n and \sigma,\sigma' \in S_n are two permutations, then \sigma is conjugate to \sigma' if and only if \phi(\sigma) is conjugate to \phi(\sigma').

In S_n we know that two elements are conjugate if and only if they have the same cycle type. We can use this to count the size of different conjugacy classes of S_n. In particular the number of permutations with n_k cycles of size k for k = 1,2,\ldots, n is

\displaystyle \frac{\displaystyle n!}{\displaystyle \prod\limits_{k=1}^n n_k! k^{n_k}}.

The transpositions (j,k) for 1 \le j < k\le n generate S_n and form a conjugacy class which we call T. For any automorphism \phi \in \text{Aut}(S_n) , the set \phi(T) must be a conjugacy class that is also of size | T | = n(n-1)/2. Since automorphism preserve order, the elements of \phi(T) must all be of order two. Thus the cycle type of the conjugacy class \phi(T) must contain only cycles of size two or size one. Also there must be an odd number of two cycles. If there was an even number of two cycles, then we would have that \phi(T) is contained in the subgroup A_n \subseteq S_n. However this would imply that \phi(S_n) \subseteq \phi(A_n) which contradicts the fact that \phi : S_n \rightarrow S_n is a bijection. Thus the following lemma implies that \phi(T)=T when n \neq 6.

Lemma: Fix n \neq 6 and an odd number k such that k > 1 and 2k \le n. Let m_k denote the number of permutations \sigma \in S_n which have k cycles of size two and n-2k cycles of size one, then m_k \neq n(n-1)/2.

Proof: This claim can be manually checked for n < 6. For n > 6, assume in order to get a contradiction that m_k = \frac{n(n-1)}{2}. Note that m_k is equal to \frac{n!}{k! 2^k(n-2k)! 1^{n-2k}}=\frac{n!}{k! (n-2k)! 2^k}. Thus we have \frac{n(n-1)}{2} = \frac{n!}{k!(n-2k)!2^k} and hence

2^{k-1} = \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!}.

Now if k \ge 5, then (n-4)(n-5)\ldots(n-2k+1) is a product of at least k consecutive integers and hence divisible by k! (see here). Thus

2^{k-1} = \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} = (n-2)(n-3)\frac{(n-4)(n-5)\ldots(n-2k+1)}{k!}

and hence \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} is divisible by an odd number other than one (either n-2 or n-3). Thus \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} cannot be a power of 2, contradicting the above equation.

When k = 3, the above equation becomes:

4= \frac{(n-2)(n-3)(n-4)(n-5)}{3!} = (n-2)\frac{(n-3)(n-4)(n-5)}{3!} = \frac{(n-2)(n-3)(n-4)}{3!}(n-5).

Thus if n is odd, n-2 is an odd number other than 1 that divides 4, a contradiction. If n is even, then since n > 6, n-5 is an odd number other than 1 that divides 4, which is again a contradiction. Thus we cannot have any case when m_k = \frac{n(n-1)}{2}.

S_n does not have an outer automorphism if n \neq 6

With the above lemma we can now prove that all the automorphism of S_n for n \neq 6 are inner. First note that the only automorphism of S_2 = \{ \text{id}, (12)\} is the identity which is an inner automorphism. When n \ge 3, the map \sigma \mapsto \gamma_\sigma from S_n to \text{Aut}(S_n) is injective. Thus there are exactly n! inner automorphisms of S_n. We will show that for n \neq 6 there are exactly n! automorphisms. Hence there can be no outer automorphisms.

Let \phi be an automorphism of S_n. The symmetric group S_n is also generated by the n-1 transpositions \sigma_i = (i,i+1) (for i =1, \ldots, n-1). By the previous lemma we know that if n \neq 6, then each \phi(\sigma_i) must be a transposition (j_i,k_i) for some j_i, k_i \in \{1,\ldots, n\} such that j_i \neq k_i. As the collection \sigma_1,\ldots,\sigma_{n-1} generates S_n, the automorphism \phi is determined by the transpositions \phi(\sigma_i)= (j_i,k_i). We will now show that there are exactly n! choices for these transpositions.

Since \sigma_1\sigma_{2} = (1,2,3), we know that \phi(\sigma_1)\phi(\sigma_2) = (j_1,k_1)(j_2,k_2) must be a three cycle. Thus one of j_2,k_2 must be equal to one of j_1,k_1 and the other must be distinct. By relabeling if necessary, we will require that k_1=j_2 and j_1 \neq k_2. Thus there are n choices for j_1, n-1 choices for k_1=j_2 and n-2 choices for k_2. This gives us n(n-1)(n-2) choices for the first two transpositions. Continuing in this manner, we note that \sigma_1\sigma_3 = (1,2)(3,4) and \sigma_2\sigma_3 = (2,3,4), we get that one of j_3,k_3 is equal to one of k_2 and the other one is distinct from j_1,k_1=j_2,k_2, thus giving us n-3 choices for the third transposition. In general we will have that \sigma_{i-1}\sigma_i is a three cycle and \sigma_l\sigma_i is a pair of disjoint two cycles if l < i-1. Thus one can show inductively that there are n-i choices for the transposition \phi(\sigma_i). Hence there are n(n-1)(n-2)\ldots 1 = n! choices for all the transpositions \phi(\sigma_i) and hence n! choices for \phi. Thus, if n \neq 6, all automorphisms of S_n are inner.

The outer automorphism of S_6.

The key part of showing that S_n didn’t have any outer automorphisms when n \neq 6 was showing that any automorphism must map transpositions to transpositions. Thus to find the outer automorphism of S_6 we will want to find an automorphism that does not preserve cycle type. In particular we will find one which maps transpositions to the cycle type (a,b)(c,d)(e,f). There are \frac{6!}{3!2^3} = 15 such triple transpositions which is exactly \frac{6\cdot 5}{2} – the number of single transpositions.

The outer automorphism, \psi, will be determined by the values it takes on \sigma_1,\sigma_2,\sigma_3,\sigma_4 and \sigma_5 (where again \sigma_i = (i,i+1)). Each of these transpositions will get to a different triple transposition (a,b)(c,d)(e,f). To extend \psi to the rest of S_6 it suffices to check the Coxeter relations \psi(\sigma_k)^2 = (\psi(\sigma_i)\psi(\sigma_{i+1}))^3 = (\psi(\sigma_i)\psi(\sigma_j))^2=\text{id} for k = 1,2,3,4,5, i =1,2,3,4 and j \ge i+2.

The relation \psi(\sigma_k)^2=\text{id} will hold for any triple transposition \psi(\sigma_k). The other relations are a bit trickier. If we have two triple transpositions \tau = (a,b)(c,d)(e,f) and \tau' = (a',b')(c',d')(e',f'), then \tau\tau' will be a product of two three cycles if none of the transpositions of \tau is a transposition of \tau'. If \tau and \tau' share exactly one transposition, then \tau\tau' is a product of two two cycles. If \tau and \tau' share two or more transpositions they must actually be equal and hence \tau\tau' is the identity.

With this in mind, one can verify that following assignment extends to a valid group homomorphism from S_6 to S_6:

\psi(\sigma_1) = (1,2)(3,4)(5,6),
\psi(\sigma_2) = (1,3)(2,5)(4,6),
\psi(\sigma_3) = (1,5)(2,6)(3,4),
\psi(\sigma_4) = (1,3)(2,4)(5,6),
\psi(\sigma_5) = (1,6)(2,5)(3,4).

We now just need to justify that the group homomorphism \psi is an automorphism. Since S_6 is a finite group, \psi is a bijection if and only if \psi is an injection. Recall that only normal subgroups of S_6 are the trivial subgroup, A_6 and S_6. Thus to prove injectivity of \psi, it suffices to show that \psi(\tau) \neq \text{id} for some \tau \in A_6. Indeed \psi(\sigma_1\sigma_3) = (1,6)(2,5) \neq \text{id} as required. Thus we have our outer automorphism!

This isn’t the most satisfying way of defining the outer automorphism as it requires using the generators of the symmetric group. There are many more intrinsic ways to define this outer automorphism which will hopefully be the topic of a later blog post!


The proof I give that for n \neq 6, the group S_n has no outer automorphisms is based off the one given in this paper On the Completeness of Symmetric Group.

Polynomial Pairing Functions

One of the great results of the 19th century German mathematician Georg Cantor is that the sets \mathbb{N} and \mathbb{N} \times \mathbb{N} have the same cardinality. That is the set of all non-negative integers \mathbb{N} = \{0,1,2,3,\ldots\} has the same size as the set of all pairs on non-negative integers \mathbb{N} \times \mathbb{N} = \{(m,n) \mid m,n \in \mathbb{N} \}, or put less precisely “infinity times infinity equals infinity”.

Proving this result amounts to finding a bijection p : \mathbb{N} \times \mathbb{N} \to \mathbb{N}. We will call such a function a pairing function since it takes in two numbers and pairs them together to create a single number. An example of one such function is

p(m,n) = 2^m(2n+1)-1

This function is a bijection because every positive whole number can be written uniquely as the product of a power of two and an odd number. Such functions are of practical as well as theoretical interest. Computers use pairing functions to store matrices and higher dimensional arrays. The entries of the matrix are actually stored in a list. When the user gives two numbers corresponding to a matrix entry, the computer uses a pairing function to get just one number which gives the index of the corresponding entry in the list. Thus having efficiently computable pairing functions is of practical importance.

Representing Pairing Functions

Our previous example of a pairing function was given by a formula. Another way to define pairing functions is by first representing the set \mathbb{N} \times \mathbb{N} as a grid with an infinite number of rows and columns like so:

We can then represent a pairing function as a path through this grid that passes through each square exactly once. Here are two examples:

The way to go from one of these paths to a function from \mathbb{N} \times \mathbb{N} to \mathbb{N} is as follows. Given as input a pair of integers (m,n), first find the dot that (m,n) represents in the grid. Next count the number of backwards steps that need to be taken to get to the start of the path and then output this number.

We can also do the reverse of the above procedure. That is, given a pairing function p : \mathbb{N} \times \mathbb{N} \to \mathbb{N}, we can represent p as a path in the grid. This is done by starting at p^{-1}(0) and joining p^{-1}(n) to p^{-1}(n+1). It’s a fun exercise to work out what the path corresponding to p(m,n) = 2^m(2n+1)-1 looks like.

Cantor’s Pairing Function

The pairing function that Cantor used is not any of the ones we have seen so far. Cantor used a pairing function which we will call q. When represented as a path, this is what q looks like:

Surprisingly there’s a simple formula that represents this pairing function q which we will now derive. First note that if we are at a point (k,0), then the value of q(k,0) is 1+2+3+\ldots+(k-1)+k= \frac{1}{2}k(k+1). This is because to get to (k,0) from (0,0) = q^{-1}(0), we have to go along k diagonals which each increase in length.

Now let (m,n) be an arbitrary pair of integers and let k = m+n. The above path first goes through (k,0) and then takes n steps to get to (m,n). Thus

q(m,n)= q(k,0)+n=\frac{1}{2}k(k+1)+n=\frac{1}{2}(n+m)(n+m+1)+n

And so Cantor’s pairing function is actually a quadratic polynomial in two variables!

Other Polynomial Pairing Functions?

Whenever we have a pairing function p, we can switch the order of the inputs and get a new pairing function \widetilde{p}. That is the function \widetilde{p} is given by \widetilde{p}(m,n)=p(n,m). When thinking of pairing functions as paths in a grid, this transformation amounts to reflecting the picture along the diagonal m = n.

Thus there are at least two quadratic pairing functions, Cantor’s function q and its switched cousin \widetilde{q}. The Fueter–Pólya theorem states these two are actually the only quadratic pairing functions! In fact it is conjectured that these two quadratics are the only polynomial pairing functions but this is still an open question.

Thank you to Wikipedia!

I first learnt that the sets \mathbb{N} and \mathbb{N} \times \mathbb{N} have the same cardinality in class a number of years ago. I only recently learnt about Cantor’s polynomial pairing function and the Fueter–Pólya theorem by stumbling across the Wikipedia page for pairing functions. Wikipedia is a great source for discovering new mathematics and for checking results. I use Wikipedia all the time. Many of these blog posts were initially inspired by Wikipedia entries.

Currently, Wikipedia is doing their annual fundraiser. If you are a frequent user of Wikipedia like me, I’d encourage you to join me in donating a couple of dollars to them:

The Stone-Čech Compactification – Part 3

In the first blog post of this series we discussed two compactifications of \mathbb{R}. We had the circle S^1 and the interval [-1,1]. In the second post of this series we saw that there is a correspondence between compactifications of \mathbb{R} and sub algebras of C_b(\mathbb{R}). In this blog post we will use this correspondence to uncover another compactification of \mathbb{R}.

Since S^1 is the one point compactification of \mathbb{R} we know that it corresponds to the subalgebra C_\infty(\mathbb{R}). This can be seen by noting that a continuous function on S^1 is equivalent to a continuous function, f, on \mathbb{R} such that \lim_{x \to +\infty} f(x) and \lim_{x \to -\infty} f(x) both exist and are equal. On the other hand the compactification [-1,1] corresponds to the space of functions f \in C_b(\mathbb{R}) such that \lim_{x \to +\infty} f(x) and \lim_{x \to - \infty}f(x) both exist (but these limits need not be equal).

We can also play this game in reverse. We can start with an algebra A \subseteq C_b(\mathbb{R}) and ask what compactification of \mathbb{R} it corresponds to. For example we may take A to be the following sub algebra

\{g+h \mid g,h \in C(\mathbb{R}), g(x+2\pi)=g(x) \text{ for all } x \in \mathbb{R} \text{ and } \lim_{x \to \pm \infty} h(x)=0 \}.

That is A contains precisely those functions in C_b(\mathbb{R}) that are the perturbation of a 2\pi periodic function by a function that vanishes at both \infty and - \infty. Since any constant function is 2 \pi periodic, we know that C_\infty(\mathbb{R}) is contained in A. Thus, as explained in the previous blog post, we know that \sigma(A) corresponds to a compactification of \mathbb{R}.

Recall that \sigma(A) consists of all the non-zero continuous C*-homeomorphisms from A to \mathbb{C}. The space \sigma(A) contains a copy o \mathbb{R} as a subspace. A point t \in \mathbb{R} corresponds to the homeomorphism \omega_t given by \omega_t(f)=f(t). There are also a circle’s worth of homeomorphisms \nu_p given by \nu_p(f) = \lim_{k \to \infty} f(2 \pi k+p) for p \in [0,2\pi). The homeomorphism \nu_p isolates the value of the 2\pi periodic part of f at the point p. This is because if f = g+h with g a 2 \pi periodic function and h a function that vanishes at infinity. Then

\nu_p(f)  = \lim\limits_{k \to \infty} (g(2 \pi k + p)+h(2 \pi k + p)) = g(p)+\lim\limits_{k \to \infty} h(2 \pi k + p) = g(p).

Thus we know that the topological space \sigma(A) is the union of a line and a circle. We now just need to work out the topology of how these to spaces are put together to make \sigma(A). We need to work out which points on the line are “close” to our points on the circle. Suppose we have a sequence of real numbers (t_n)_{n \in \mathbb{N}} and a point p \in [0,2\pi) such that the following two conditions are satisfied. Firstly | t_n | \rightarrow \infty and secondly there exists a sequence of integers (k_n)_{n \in \mathbb{N}} such that \lim\limits_{n \to \infty} (t_n - 2\pi k_n) = p. Then we would have that \omega_{t_n} converges to \nu_p in \sigma(A).

Thus we know that the copy of \mathbb{R} in must spiral towards the copy of S^1 in \sigma(A) and that this spiraling must happen as we approach either positive infinity or negative infinity. Thus we can realise \sigma(A) as the following subset of \mathbb{R}^3 that looks a bit like Christmas tree decoration:

Here we have the black circle sitting in the x,y plane of \mathbb{R}^3. The red line is a copy of \mathbb{R} that spirals towards the circle. Negative numbers sit below the circle and positive numbers sit above. On left is a sideways view of this space and on the right is the view from above. I made these images in geogebra. If you follow this link, you can see the equations that define the above space and move the space around.

This example shows just how complicated the Stone-Čech compactification \beta \mathbb{R} of \mathbb{R} must be. Our relatively simple algebra A gave this quite complicated compactification shown above. The Stone-Čech compactification surjects onto the above compactification and corresponds to the huge algebra of all bounded and continuous function from \mathbb{R} to \mathbb{C}.


The Wikipedia page on the Stone-Čech compactification and these notes by Terrence Tao were where I first learned of the Stone-Čech compactification. I learnt about C*-algebras in a great course on operator algebras run by James Tenner at ANU. We used the textbook A Short Course on Spectral Theory by William Averson which has some exercises at the end of chapter 2 about the Stone-Čech compactification of \mathbb{R}. The example of the algebra A \subseteq C_b(\mathbb{R}) used in this blog post came from an assignment question set by James.

Parts one and two of this series can be found here.

The Stone-Čech Compactification – Part 2

This is the second post in a series about the Stone-Čech compactification. In the previous post we discussed compactifications and defined the Stone-Čech compactification. In this blog post we will show the existence of the Stone-Čech compactification of an arbitrary space. To do this we will use a surprising tool, C*-algebras. In the final blog post we take a closer look at what’s going on when our space is \mathbb{R}.

The C*-algebra of operators on a Hilbert space

Before I define what a C*-algebra is, it is good to see a few examples of C*-algebras. If H is a Hilbert space over the complex numbers, then we define B(H) the space of bounded linear operators from H to H. The space B(H) is a Banach space under the operator norm. The space B(H) is also a unital algebra since we can compose operators in B(H) and the identity operator acts as a unit. This composition satisfies the inequality \Vert ST \Vert \le  \Vert S  \Vert   \Vert  T \Vert for all S,T \in B(H). Thus B(H) is a Banach algebra. Finally we have an involution * : B(H) \rightarrow B(H) given by the adjoint. That is if T is a bounded operator on H, then T^* is the unique bounded operator satisfying

\langle T h, g \rangle = \langle h, T^* g \rangle,

for every h,g \in H. This involution is conjugate linear and satisfies (ST)^* = T^*S^* for all S, T \in B(H). This involution also satisfies the C*-property that \Vert T^*T\Vert = \Vert T \Vert^2 for all T \in B(H).

The C*-algebra of continuous functions on a compact set

If K is a compact topological space, then the Banach space C(K) of continuous functions from K to \mathbb{C} is a unital Banach algebra. The norm on this space is the supremum norm

\Vert f \Vert = \sup_{x \in K} \vert f(x) \vert

and multiplication is defined pointwise. This algebra has a unit which is the function that is constantly one. This space also has an involution * : C(K) \rightarrow C(K) given by f^*(x) = \overline{f(x)}. This involution is also conjugate linear and it satisfies (fg)^* = f^*g^* = g^*f^* and the C*-property \Vert f^*f \Vert = \Vert f \Vert^2.

Both B(H) and C(K) are examples of unital C*-algebras. We will define a unital C*-algebra to be a unital Banach algebra A with an involution * : A \rightarrow A such that

  1. The involution is conjugate linear.
  2. (ab)^* = b^*a^* for all a, b \in A.
  3. \Vert a^*a \Vert = \Vert a \Vert^2 for all a \in A.

Our two examples B(H) and C(K) are different in one important way. The C*-algebra C(K) is commutative whereas in general B(H) is not. In some sense the C*-algebras C(K) are the only commutative unital C*-algebras. It is the precise statement of this fact that will let us define the Stone-Čech compactification of a space.

The Gelfand spectrum

If A is any unital C*-algebra we can define it’s Gelfand spectrum \sigma(A) to be the set of all continuous, non-zero C*-homomorphisms from A to \mathbb{C}. That is every \omega \in \sigma(A) is a non-zero continuous linear functional from A to \mathbb{C} such that \omega (ab) =  \omega (a) \omega (b) and \omega (a^*) = \overline{ \omega (a)}. It can be shown that \sigma(A) is a weak*-closed subset of the unit ball in A', the dual of A. Thus by the Banach-Alaoglu theorem, \sigma(A) is compact in the relative weak*-topology.

For example, take A = C(K) for some compact Hausdorff set K. In this case we have a map from K to \sigma(C(K)) given by p \mapsto \ \omega _p where \omega _p : C(K) \rightarrow  \mathbb{C} is the evaluation map given by \omega _p(f) = f(p). This gives a continuous injection from K into \sigma(C(K)). It turns out that this map is in fact also surjective and hence a homeomorphism between K and \sigma(C(K)). Thus every continuous non-zero homeomorphism on C(K) is of the form \omega_p for some p \in K. Thus we may simply regard \sigma(C(K)) as being equal to K.

The Gelfand spectrum \sigma(A) contains essentially all of the information about A when A is a commutative C*-algebra. This claim is made precise by the following theorem.

Theorem: If A is a unital commutative C*-algebra, then A is C*-isometric to C(\sigma(A)), the space of continuous functions f : \sigma(A) \to \mathbb{C}. This isomorphism is given by the map a\in A \mapsto f_a where f_a : \sigma(A) \rightarrow \mathbb{C} is given by f_a( \omega) =  \omega (a) for all \omega \in \sigma(A).

This powerful theorem tells us that every unital commutative C*-algebra is of the form C(K) for some compact space K. Furthermore this theorem tells us that we can take K to be the Gelfand spectrum of our C*-algebra.

The Gelfand spectrum and compactifications

We will now turn back to our original goal of constructing compactifications. If X is a locally compact Hausdorff space then we can define C_\infty(X) to be the space of continuous functions f : X \rightarrow \mathbb{C} that “have a limit at infinity”. By this we mean that for every f \in C_\infty(X) there exists a constant c \in \mathbb{C} such that for all \varepsilon > 0, there exists a compact set K \subseteq X such that |f(x)-c| < \varepsilon for all x \in X \setminus K. If we equip C_\infty(X) with the supremum norm and define f^*(x) = \overline{f(x)}, then C_\infty(X) becomes a commutative unital C*-algebra under point-wise addition and multiplication.

We have a map from X to \sigma(C_\infty(X)) given by evaluation. This map is still an homeomorphism onto its image but the map is not surjective if X is not compact. In the case when X is not compact, we have an extra element of \omega_\infty given by \omega_\infty(f) = \lim f. Thus we have that \sigma(C_\infty(X)) \cong X \cup \{\omega_\infty\} and hence we have rediscovered the one-point compactification of X.

A similar approach can be used to construct the Stone-Čech compactification. Rather than using the C*-algebra C_\infty(X), we will use the C*-algebra C_b(X) of all continuous and bounded functions from X to \mathbb{C}. This is a C*-algebra under the supremum norm. We will show that the space \beta X := \sigma(C_b(X)) satisfies the universal property of the Stone-Čech compactification. The map \phi : X \rightarrow \beta X is the same one given above. For any p \in X, \phi(p) \in \beta X = \sigma(C_b(X)) is defined to be the evaluation at p homomorphism \omega_p. This map is a homeomorphism between X and an open dense subset of \beta X. As in the case of the one point compactification, this map is not surjective. There are heaps of elements of \beta X \setminus \phi(X) as can be seen by the fact that \beta X surjects onto any other compactification of X. However it is very hard to give an explicit definition of an element of \beta X \setminus X.

We will now show that \beta X = \sigma(C_b(X)) satisfies the universal property of the Stone-Čech compactification. Let (K,\psi) be a compactification of X. We wish to construct a morphism from (\beta X,\phi) to (K,\psi). That is we wish to find a map f : \beta X \rightarrow K such that f \circ \phi = \psi. Note that such a map is automatically surjective as are all morphisms between compactifications. We can embed C(K) in C_b(X) by the map f \mapsto f \circ \psi. Since \psi(X) is dense in K, we have that this map is a C*-isometry from C(K) to its image in C_b(X). Above we argued that \sigma(C(K)) \cong K. The compactification (K,\psi) is in fact isomorphic to (\sigma(C(K)), \widetilde {\psi}) where \widetilde {\psi}(p)=\omega_{\psi(p)}. Thus we will construct our morphism from (\beta X, \phi) to (\sigma(C(K)), \widetilde {\psi}).

Now elements of \beta X are homeomorphism on C_b(X) and elements of \sigma(C(K)) are homeomorphism on C(K). Since we can think of C(K) as being a subspace of C_b(K) we can define the map f : \beta X \rightarrow \sigma(C(K)) to be restriction to C(K). That is f(\omega) = \omega_{\mid C(K)}. Note that since C(K) contains the unit of C_b(X), the above map is well defined (in particular \omega \neq 0 implies \omega_{\mid C(K)} \neq 0). One can check that the relation f \circ \phi = \widetilde{\psi} does indeed hold since both \phi and \widetilde{\psi} correspond to point evaluation. Thus we have realised the Stone-Čech compactification of X as the Gelfand spectrum of C_b(X).

The above argument can be modified to give a correspondence between compactifications of X and sub C*-algebras of C_b(X) that contain C_\infty(X). This correspondence is given by sending the sub C*-algebra A to \sigma(A) and the point evaluation map. This correspondence is order reversing in the sense that if we have A_1 \subseteq A_2 for two sub C*-algebras, then we have a morphism from \sigma(A_2) to \sigma(A_1).

In the final blog post of the series we will further explore this correspondence between compactifications and subalgebras in the case when X = \mathbb{R}. Part one of this series can be found here.

The Stone-Čech Compactification – Part 1

Mathematics is full of surprising connections between two seemingly unrelated topics. It’s one of the things I like most about maths. Over the next few blog posts I hope to explain one such connection which I’ve been thinking about a lot recently.

The Stone-Čech compactification connects the study of C*-algebras with topology. This first blog post will set the scene by explaining the topological notion of a compactification. In the next blog post, I’ll define and discuss C*-algebras and we’ll see how they can be used to study compactifications. In the final post we will look at a particular example.


Let X be a locally compact Hausdorff space. A compactification of X is a compact Hausdorff space K and a continuous function \phi : X \rightarrow K such that \phi is a homeomorphism between X and \phi(X) and \phi(X) is an open, dense subset of K. Thus a compactification is a way of nicely embedding the space X into a compact space K.

For example if X is the real line \mathbb{R}, then the circle S^1 and the stereographic projection is a compactification of X. In this case the image of X is all of the circle apart from one point. Since the circle is compact, this is indeed a compactification. This is an example of a one-point compactification. An idea which we will return to later.

Comparing Compactifications

A space X will in general have many compactifications and we would like to compare these different compactifications. Suppose that (K_1,\phi_1) and (K_2, \phi_2) are two compactifications. Then a morphism from (K_1,\phi_1) to (K_2,\phi_2) is a continuous map f : K_1 \rightarrow K_2 such that \phi_2 = f \circ \phi_1. That is, the below diagram commutes:

Hence we have a morphism from (K_1,\phi_1) to (K_2,\phi_2) precisely when the embedding \phi_2 : X \rightarrow K_2 extends to a map f : K_1 \rightarrow K_2. Since \phi_1(X) is dense in K_1, if such a function f exists, it is unique.

Note that the map f must be surjective since \phi_2(X) is dense in K_2 and g(K_1) is a closed set containing \phi_2(X). We can think of the compactification (K_1,\phi_1) as being “bigger” or “more general” than the compactification (K_2, \phi_2) as f is a surjection onto K_2. More formally we will say that (K_1,\phi_1) is finer than (K_2, \phi _2) and equivalently that (K_2, \phi _2) is coarser than (K_1, \phi _1) whenever there is a morphism from (K_1, \phi_1) to (K_2,\phi_2). Note that the composition of two morphism of compactifications is again a morphism of compactifications. Thus we can talk about the category of compactifications of a space. The compactifications (K_1, \phi _1) and (K_2, \phi _2) are isomorphic if there exists a morphism g between (K_1, \phi _1) and (K_2, \phi _2) such that g is a homeomorphism between K_1 and K_2.

For example again let X = \mathbb{R}. Then the closed interval [-1,1] is again a compactification of \mathbb{R} with the map (x,y) \mapsto \frac{x}{1+|x|} which maps \mathbb{R} onto the open interval (-1,1). We can then create a morphism from [-1,1] to the circle by sending the endpoints of [-1,1] to the top of the circle and sending the interior of [-1,1] to the rest of the circle. We can perform this map in such a way that the following diagram commutes:

Thus we have a morphism from the compactification [-1,1] to the compactification S^1. Thus the compactification [-1,1] is finer than the compactification S^1.

Now that we have a way of comparing compactifications of X we can ask about the existence of extremal compactifications of X. Does there exists a compactification of X that is the coarser than any other compactification? Or one which is finer than any other? From a category-theory perspective, we are interested in the existence of terminal and initial objects in the category of compactifications of X. We will first show the existence of a coarsest or “least general” compactification.

The one point compactification

A coarsest compactification would be a terminal object in the category of compactifications. That is a compactification (\alpha X, i) with the property that for all compactification (K, \phi) there is a unique extension g : K \rightarrow \alpha X of i : X \rightarrow \alpha X. If such a coarsest compactification exists, then it is unique up to isomorphism. Thus we can safely refer to the coarsest compactification.

The one point compactification of X is constructed by adding a single point denoted by \infty to X. It is defined to be the set \alpha X = X \sqcup \{ \infty\} with the topology given by the collection of open sets in X and sets of the form \alpha X \setminus K for K \subseteq X a compact subset. The map i : X \rightarrow \alpha X is given by simply including X into \alpha X = X \sqcup \{\infty\}.

The one point compactification is the coarsest compactification of X. Let (K,\phi) be another compactification of X. Then the map g : K \rightarrow \alpha X given by

g(y)  = \begin{cases} i(\phi^{-1}(y)) & \text{if } y \in \phi(X),\\  \infty & \text{if } y \notin \phi(X), \end{cases}

is the unique morphism from (K,f) to (\alpha X, i).

The Stone-Čech compactification

A Stone-Čech compactification of X is a compactification (\beta X,j) which is the finest compactification of X. That is (\beta X,j) is an initial object in the category of a compactifications of X and so for every compactification (K,f) there exists a unique morphism from (\beta X,j) to (K,f). Thus any embedding f : X \rightarrow K, has a unique extension g : \beta X \rightarrow K. As with coarest compactification of X, the Stone-Čech compactification of X is unique up to isomorphism and thus we will talk of the Stone-Čech compactification.

Unlike the one point compactification of X, there is no simple description of \beta X even when X is a very simple space such as \mathbb{N} or \mathbb{R}. To show the existence of a Stone-Čech compactification of any space X we will need to make a detour and develop some tools from the study of C*-algebras which will be the topic of the next blog post.