Braids and the Yang-Baxter equation

I recently gave a talk on the Yang-Baxter equation. The focus of the talk was to state the connection between the Yang-Baxter equation and the braid relation. This connection comes from a system of interacting particles. In this post, I’ll go over part of my talk. You can access the full set of notes here.

Interacting particles

Imagine two particles on a line, each with a state that can be any element of a set \mathcal{X}. Suppose that the only way particles can change their states is by interacting with each other. An interaction occurs when two particles pass by each other. We could define a function F : \mathcal{X} \times \mathcal{X} \to \mathcal{X} \times \mathcal{X} that describes how the states change after interaction. Specifically, if the first particle is in state x and the second particle is in state y, then their states after interacting will be

F(x,y) = (F_a(x,y), F_b(x,y)) = (\text{new state of particle 1}, \text{new state of particle 2}),

where F_a,F_b : \mathcal{X} \times \mathcal{X} \to \mathcal{X} are the components of F. Recall that the particles move past each other when they interact. Thus, to keep track of the whole system we need an element of \mathcal{X} \times \mathcal{X} to keep track of the states and a permutation \sigma \in S_2 to keep track of the positions.

Three particles

Now suppose that we have 3 particles labelled 1,2,3. As before, each particle has a state in \mathcal{X}. We can thus keep track of the state of each particle with an element of \mathcal{X}^3. The particles also have a position which is described by a permutation \sigma \in S_3. The order the entries of (x,y,z) \in \mathcal{X}^3 corresponds to the labels of the particles not their positions. A possible configuration is shown below:

Three dots. The dots are labelled above 1, 3, 2 and labelled below x, z, y.
A possible configuration of the three particles. The above configuration is escribed as having states (x,y,z) \in \mathcal{X}^3 and positions 132 \in S_3.

As before, the particles can interact with each other. However, we’ll now add the restriction that the particles can only interact two at a time and interacting particles must have adjacent positions. When two particles interact, they swap positions and their states change according to F. The state and position of the remaining particle is unchanged. For example, in the above picture we could interact particles 1 and 3. This will produce the below configuration:

The new configuration after interacting particles 1 and 3 in the first diagram. The configuration is now described by the states F_{13}(x,y,z) \in \mathcal{X}^3 and the permutation 312 \in S_3.

To keep track of how the states of the particles change over time we will introduce three functions from \mathcal{X}^3 to \mathcal{X}^3. These functions are F_{12},F_{13},F_{23}. The function F_{ij} is given by applying F to the i,j coordinates of (x,y,z) and acting by the identity on the remaining coordinate. In symbols,

F_{12}(x,y,z) = (F_a(x,y), F_b(x,y), z),
F_{13}(x,y,z) = (F_a(x,z), y, F_b(x,z)),
F_{23}(x,y,z) = (x, F_a(y,z), F_b(y,z)).

The function F_{ij} exactly describes how the states of the three particles change when particles i and j interact. Now suppose that three particles begin in position 123 and states (x,y,z). We cannot directly interact particles 1 and 3 since they are not adjacent. We have to pass first pass one of the particles through particle 2. This means that there are two ways we can interact particles 1 and 3. These are illustrated below.

The two ways of passing through particle 2 to interact particles 2 and 3.

In the top chain of interactions, we first interact particles 2 and 3. In this chain of interactions, the states evolve as follows:

(x,y,z) \to F_{23}(x,y,z) \to   F_{13}(F_{23}(x,y,z)) \to F_{12}(F_{13}(F_{23}(x,y,z))).

In the bottom chain of interactions, we first interact particles 1 and 2. On this chain, the states evolve in a different way:

(x,y,z) \to F_{12}(x,y,z) \to   F_{13}(F_{12}(x,y,z)) \to F_{23}(F_{13}(F_{12}(x,y,z))).

Note that after both of these chains of interactions the particles are in position 321. The function F is said to solve the Yang–Baxter equation if two chains of interactions also result in the same states.

Definition: A function F : \mathcal{X} \times \mathcal{X} \to \mathcal{X} \times \mathcal{X} is a solution to the set theoretic Yang–Baxter equation if,

    F_{12}\circ F_{13} \circ F_{23} = F_{23} \circ F_{13} \circ F_{12}.

This equation can be visualized as the “braid relation” shown below. Here the strings represent the three particles and interacting two particles corresponds to crossing one string over the other.

The braid relation.

Here are some examples of solutions to the set theoretic Yang-Baxter equation,

  • The identity on \mathcal{X} \times \mathcal{X}.
  • The swap map, (x,y) \mapsto (y,x).
  • If f,g : \mathcal{X} \to \mathcal{X} commute, then the function (x,y) \to (f(x), g(y)) is a solution the Yang-Baxter equation.

In the full set of notes I talk about a number of extensions and variations of the Yang-Baxter equation. These include having more than three particles, allowing for the particle states to be entangle and the parametric Yang-Baxter equation.

Double cosets and contingency tables

Like my previous post, this blog is also motivated by a comment by Professor Persi Diaconis in his recent Stanford probability seminar. The seminar was about a way of “collapsing” a random walk on a group to a random walk on the set of double cosets. In this post, I’ll first define double cosets and then go over the example Professor Diaconis used to make us probabilists and statisticians more comfortable with all the group theory he was discussing.

Double cosets

Let G be a group and let H and K be two subgroups of G. For each g \in G, the (H,K)-double coset containing g is defined to be the set

HgK = \{hgk : h \in H, k \in K\}

To simplify notation, we will simply write double coset instead of (H,K)-double coset. The double coset of g can also be defined as the equivalence class of g under the relation

g \sim g' \Longleftrightarrow g' = hgk for some h \in H and g \in G

Like regular cosets, the above relation is indeed an equivalence relation. Thus, the group G can be written as a disjoint union of double cosets. The set of all double cosets of G is denoted by H\backslash G / K. That is,

H\backslash G /K = \{HgK : g \in G\}

Note that if we take H = \{e\}, the trivial subgroup, then the double cosets are simply the left cosets of K, G / K. Likewise if K = \{e\}, then the double cosets are the right cosets of H, H \backslash G. Thus, double cosets generalise both left and right cosets.

Double cosets in S_n

Fix a natural number n > 0. A partition of n is a finite sequence a = (a_1,a_2,\ldots, a_I) such that a_1,a_2,\ldots, a_I \in \mathbb{N}, a_1 \ge a_2 \ge \ldots a_I > 0 and \sum_{i=1}^I a_i = n. For each partition of n, a, we can form a subgroup S_a of the symmetric group S_n. The subgroup S_a contains all permutations \sigma \in S_n such that \sigma fixes the sets A_1 = \{1,\ldots, a_1\}, A_2 = \{a_1+1,\ldots, a_1 + a_2\},\ldots, A_I = \{n - a_I +1,\ldots, n\}. Meaning that \sigma(A_i) = A_i for all 1 \le i \le I. Thus, a permutation \sigma \in S_a must individually permute the elements of A_1, the elements of A_2 and so on. This means that, in a natural way,

S_a \cong S_{a_1} \times S_{a_2} \times \ldots \times S_{a_I}

If we have two partitions a = (a_1,a_2,\ldots, a_I) and b = (b_1,b_2,\ldots,b_J), then we can form two subgroups H = S_a and K = S_b and consider the double cosets H \backslash G / K = S_a \backslash S_n / S_b. The claim made in the seminar was that the double cosets are in one-to-one correspondence with I \times J contingency tables with row sums equal to a and column sums equal to b. Before we explain this correspondence and properly define contingency tables, let’s first consider the cases when either H or K is the trivial subgroup.

Left cosets in S_n

Note that if a = (1,1,\ldots,1), then S_a is the trivial subgroup and, as noted above, S_a \backslash S_n / S_b is simply equal to S_n / S_b. We will see that the cosets in S_n/S_b can be described by forgetting something about the permutations in S_n.

We can think of the permutations in S_n as all the ways of drawing without replacement n balls labelled 1,2,\ldots, n. We can think of the partition b = (b_1,b_2,\ldots,b_J) as a colouring of the n balls by J colours. We colour balls 1,2,\ldots, b_1 by the first colour c_1, then we colour b_1+1,b_1+2,\ldots, b_1+b_2 the second colour c_2 and so on until we colour n-b_J+1, n-b_J+2,\ldots, n the final colour c_J. Below is an example when n is equal to 6 and b = (3,2,1).

The first three balls are coloured green, the next two are coloured red and the last ball is coloured blue.

Note that a permutation \sigma \in S_n is in S_b if and only if we draw the balls by colour groups, i.e. we first draw all the balls with colour c_1, then we draw all the balls with colour c_2 and so on. Thus, continuing with the previous example, the permutation \sigma_1 below is in S_b but \sigma_2 is not in S_b.

The permutation \sigma_1 is in S_b because the colours are in their original order but \sigma_1 is not in S_b because the colours are rearranged.

It turns out that we can think of the cosets in S_n \setminus S_b as what happens when we “forget” the labels 1,2,\ldots,n and only remember the colours of the balls. By “forgetting” the labels we mean only paying attention to the list of colours. That is for all \sigma_1,\sigma_2 \in S_n, \sigma_1 S_b = \sigma_2 S_b if and only if the list of colours from the draw \sigma_1 is the same as the list of colours from the draw \sigma_2. Thus, the below two permutations define the same coset of S_b

When we forget the labels and only remember the colours, the permutations \sigma_1 and \sigma_2 look the same and thus are in the same left coset of S_b.

To see why this is true, note that \sigma_1 S_b = \sigma_2 S_b if and only if \sigma_2 = \sigma_1 \circ \tau for some \tau \in S_b. Furthermore, \tau \in S_b if and only if \tau maps each colour group to itself. Recall that function composition is read right to left. Thus, the equation \sigma_2 = \sigma_1 \circ \tau means that if we first relabel the balls according to \tau and then draw the balls according to \sigma_1, then we get the result as just drawing by \sigma_2. That is, \sigma_2 = \sigma_1 \circ \tau for some \tau \in S_b if and only if drawing by \sigma_2 is the same as first relabelling the balls within each colour group and then drawing the balls according to \sigma_1. Thus, \sigma_1 S_b = \sigma_2 S_b, if and only if when we forget the labels of the balls and only look at the colours, \sigma_1 and \sigma_2 give the same list of colours. This is illustrated with our running example below.

If we permute the balls according to \tau \in S_b and the draw according to \sigma_1, then the resulting draw is the same as if we had not permuted and drawn according to \sigma_2. That is, \sigma_2 = \sigma_1 \circ \tau.

Right cosets of S_n

Typically, the subgroup S_a is not a normal subgroup of S_n. This means that the right coset S_a \sigma will not equal the left coset \sigma S_a. Thus, colouring the balls and forgetting the labelling won’t describe the right cosets S_a \backslash S_n. We’ll see that a different type of forgetting can be used to describe S_a \backslash S_n = \{S_a\sigma : \sigma \in S_n\}.

Fix a partition a = (a_1,a_2,\ldots,a_I) and now, instead of considering I colours, think of I different people p_1,p_2,\ldots,p_I. As before, a permutation \sigma \in S_n can be thought of drawing n balls labelled 1,\ldots,n without replacement. We can imagine giving the first a_1 balls drawn to person p_1, then giving the next a_2 balls to the person p_2 and so on until we give the last a_I balls to person p_I. An example with n = 6 and a = (2,2,2) is drawn below.

Person p_1 receives the ball labelled by 6 followed by the ball labelled 3, person p_2 receives ball 2 and then ball 1 and finally person p_3 receives ball 4 followed by ball 5.

Note that \sigma \in S_a if and only if person p_i receives the balls with labels \sum_{k=0}^{i-1}a_k+1,\sum_{k=0}^{i-1}a_k+2,\ldots, \sum_{k=0}^i a_k in any order. Thus, in the below example \sigma_1 \in S_a but \sigma_2 \notin S_a.

When the balls are drawn according to \sigma_1, person p_i receives the balls with labels 2i-1 and 2i, and thus \sigma_1 \in S_a. On the other hand, if the balls are drawn according to \sigma_2, the people receive different balls and thus \sigma_2 \notin S_a.

It turns out the cosets S_a \backslash S_n are exactly determined by “forgetting” the order in which each person p_i received their balls and only remembering which balls they received. Thus, the two permutation below belong to the same coset in S_a \backslash S_n.

When we forget the order in which each person receive their balls, the permutations \sigma_1 and \sigma_2 become the same and thus S_a \sigma_1 = S_a \sigma_2. Note that if we coloured the balls according to the permutation a = (a_1,\ldots,a_I), then we could see that \sigma_1 S_a \neq \sigma_2 S_a.

To see why this is true in general, consider two permutations \sigma_1,\sigma_2. The permutations \sigma_1,\sigma_2 result in each person p_i receiving the same balls if and only if after \sigma_1 we can apply a permutation \tau that fixes each subset A_i  =  \left\{\sum_{k=0}^{i-1}a_k+1,\ldots, \sum_{k=0}^i a_k\right\} and get \sigma_2. That is, \sigma_1 and \sigma_2 result in each person p_i receiving the same balls if and only if \sigma_2 = \tau \circ \sigma_1 for some \tau \in S_a. Thus, \sigma_1,\sigma_2 are the same after forgetting the order in which each person received their balls if and only if S_a \sigma_1 = S_a \sigma_2. This is illustrated below,

If we first draw the balls according to \sigma_1 and then permute the balls according to \tau, then the resulting draw is the same as if we had drawn according to \sigma_2 and not permuted afterwards. That is, \sigma_2 = \tau \circ \sigma_1 .

We can thus see why S_a \backslash S_n \neq S_n / S_a. A left coset \sigma S_a correspond to pre-composing \sigma with elements of S_a and a right cosets S_a\sigma correspond to post-composing \sigma with elements of S_a.

Contingency tables

With the last two sections under our belts, describing the double cosets S_a \backslash S_n / S_b is straight forward. We simply have to combine our two types of forgetting. That is, we first colour the n balls with J colours according to b = (b_1,b_2,\ldots,b_J). We then draw the balls without replace and give the balls to I different people according a = (a_1,a_2,\ldots,a_I). We then forget both the original labels and the order in which each person received their balls. That is, we only remember the number of balls of each colour each person receives. Describing the double cosets by “double forgetting” is illustrated below with a = (2,2,2) and b = (3,2,1).

The permutations \sigma_1 and \sigma_2 both result in person p_1 receiving one green ball and one blue ball. The two permutations also results in p_2 and p_3 both receiving one green ball and one red ball. Thus, \sigma_1 and \sigma_2 are both in the same (S_a,S_b)-double coset. Note however that \sigma_1 S_b \neq \sigma_2 S_b and S_a\sigma_1 \neq S_a \sigma_2.

The proof that double forgetting does indeed describe the double cosets is simply a combination of the two arguments given above. After double forgetting, the number of balls given to each person can be recorded in an I \times J table. The N_{ij} entry of the table is simply the number of balls person p_i receives of colour c_j. Two permutations are the same after double forgetting if and only if they produce the same table. For example, \sigma_1 and \sigma_2 above both produce the following table

Green (c_1) Red (c_2)Blue (c_3)Total
Person p_11012
Person p_21102
Person p_31102

By the definition of how the balls are coloured and distributed to each person we must have for all 1 \le i \le I and 1 \le j \le J

\sum_{j=1}^J N_{ij} = a_i and \sum_{i=1}^I N_{ij} = b_j

An I \times J table with entries N_{ij} \in \{0,1,2,\ldots\} satisfying the above conditions is called a contingency table. Given such a contingency table with entries N_{ij} where the rows sum to a and the columns sum to b, there always exists at least one permutation \sigma such that N_{ij} is the number of balls received by person p_i of colour c_i. We have already seen that two permutations produce the same table if and only if they are in the same double coset. Thus, the double cosets S_a \backslash S_n /S_b are in one-to-one correspondence with such contingency tables.

The hypergeometric distribution

I would like to end this blog post with a little bit of probability and relate the contingency tables above to the hyper geometric distribution. If a = (m, n-m) for some m \in \{0,1,\ldots,n\}, then the contingency tables described above have two rows and are determined by the values N_{11}, N_{12},\ldots, N_{1J} in the first row. The numbers N_{1j} are the number of balls of colour c_j the first person receives. Since the balls are drawn without replacement, this means that if we put the uniform distribution on S_n, then the vector Y=(N_{11}, N_{12},\ldots, N_{1J}) follows the multivariate hypergeometric distribution. Thus, if we have a random walk on S_n that quickly converges to the uniform distribution on S_n, then we could use the double cosets to get a random walk that converges to the multivariate hypergeometric distribution (although there are smarter ways to do such sampling).

The field with one element in a probability seminar

Something very exciting this afternoon. Professor Persi Diaconis was presenting at the Stanford probability seminar and the field with one element made an appearance. The talk was about joint work with Mackenzie Simper and Arun Ram. They had developed a way of “collapsing” a random walk on a group to a random walk on the set of double cosets. As an illustrative example, Persi discussed a random walk on GL_n(\mathbb{F}_q) given by multiplication by a random transvection (a map of the form v \mapsto v + ab^Tv, where a^Tb = 0).

The Bruhat decomposition can be used to match double cosets of GL_n(\mathbb{F}_q) with elements of the symmetric group S_n. So by collapsing the random walk on GL_n(\mathbb{F}_q) we get a random walk on S_n for all prime powers q. As Professor Diaconis said, you can’t stop him from taking q \to 1 and asking what the resulting random walk on S_n is. The answer? Multiplication by a random transposition. As pointed sets are vector spaces over the field with one element and the symmetric groups are the matrix groups, this all fits with what’s expected of the field with one element.

This was just one small part of a very enjoyable seminar. There was plenty of group theory, probability, some general theory and engaging examples.

Update: I have written another post about some of the group theory from the seminar! You can read it here: Double cosets and contingency tables.

Commuting in Groups

In July 2020 I submitted my honours thesis which was titled “Commuting in Groups” and supervised by Professor Anthony Licata. You can read the abstract below and the full thesis here.


In this thesis, we study four different classes of groups coming from geometric group theory. Each of these classes are defined in terms of fellow traveller conditions. First we study automatic groups and show that such groups have a solvable word problem. We then study hyperbolic groups and show that a group is hyperbolic
if and only if it is strongly geodesically automatic. We also show that a group is hyperbolic if and only if it has a divergence function. We next study combable groups and examine some examples of groups that are combable but not automatic. Finally we introduce biautomatic groups. We show that biautomatic groups have a solvable conjugacy problem and that many nilpotent groups cannot be subgroups of biautomatic groups. Finally we introduce Artin groups of finite type and show that all such groups are biautomatic.

Sums of Squares – Part 2: Motzkin

In the previous blog post we observed that if a polynomial g could be written as a sum \sum_{i=1}^n h_i^2 where each h_i is a polynomial, then g must be non-negative. We then asked if the converse was true. That is, if g is a non-negative polynomial, can g be written a sum of squares of polynomials? As we saw, a positive answer to this question would allow us to optimize a polynomial function f by looking at the values of \lambda such that g = f - \lambda can be written as a sum of squares.

Unfortunately, as we shall see, not all non-negative polynomials can be written as sums of squares.

The Motzkin Polynomial

The Motzkin polynomial is a two variable polynomial defined by

g(x,y) = x^4y^2 +x^2y^4-3x^2y^2+1.

We wish to prove two things about this polynomial. The first claim is that g(x,y) \ge 0 for all x,y \in \mathbb{R} and the second claim is that g cannot be written as a sum of squares of polynomials. To prove the first claim we will use the arithmetic mean geometric mean inequality. This inequality states that for all n \in \mathbb{N} and all a_1,a_2,\ldots, a_n \ge 0, we have that

\left(a_1a_2\ldots a_n\right)^{1/n} \le \frac{a_1+a_2+\ldots+a_n}{n}.

We will apply this inequality with n = 3, a_1 = x^4 y^2, a_2 = x^2 y^4 and a_3 = 1. This gives that

\left(x^4 y^2 x^2 y^4 1\right)^\frac{1}{3} \le \frac{x^4 y^2 + y^2 x^4 +1 }{3}.

Simplifying the left hand side and multiplying both sides by 3 gives that

3x^2 y^2 \le x^4 y^2 + y^2 x^4 + 1.

Since our Motzkin polynomial g is given by g(x,y) = x^4 y^2 + y^2 x^2 - 3x^2 y^2 +1, the above inequality is equivalent to g(x,y) \ge 0.

Newton Polytopes

Thus we have shown that the Motzkin polynomial is non-negative. We now wish to show that it is not a sum of squares. To do this we will first have to define the Newton polytope of a polynomial. If our polynomial f has n variables, then the Newton polytope of f, denoted N(f) is a convex subset of \mathbb{R}^n. To define N(f) we associate a point in \mathbb{R}^n to each term of the polynomial f based on the degree of each variable. For instance the term 3x^2y^4 is assigned the point (2,4) \in \mathbb{R}^2 and the term 4xyz^3 is assigned the point (1,1,3) \in \mathbb{R}^2. Note that the coefficients of the term don’t affect this assignment.

We then define N(f) to be the convex hull of all points assigned to terms of the polynomial f. For instance if f(x,y) = x^4y^3+5x^3y - 7y^3+8, then the Newton polytope of f is the following subset of \mathbb{R}^2.

Note again that changing the non-zero coefficients of the polynomial f does not change N(f).

Now suppose that our polynomial is a sum of squares, that is f = \sum_{i=1}^n h_i^2. It turns out the the Newton polytope of f can be defined in terms of the Newton polytopes of h_i. In particular we have that N(f) =2X where X is the convex hull of the union of N(h_i) for i = 1,\ldots, n.

To see why this is true, note that if a and b are the points corresponding to two terms p and q, then a+b corresponds to pq. Thus we can see that every term of f corresponds to a point that can be written as a+b for some a,b \in N(h_i). It follows that N(f) \subseteq 2X.

Conversely note that if we have terms p, q, r corresponding to points a,b, c and pq = r^2, then a+b = 2c. It follows that if c is a vertex in X corresponding to the term r in some polynomial h_i and p,q are two terms in a polynomial h_j such that pq = r^2, then p=q=r. This is because if r was not equal to either p or q, then the point c would not be a vertex and would instead lie on the line between the points assigned to p and q.

It follows that if c is a vertex of X with corresponding term r, then r^2 appears with a positive coefficient in f = \sum_{i=1}^n h_i. It follows that 2X \subseteq N(f) and so 2X = N(f).

Not a sum of squares

Let’s now take another look at the Motzkin polynomial which is defined to be

g(x,y) = x^4y^2 + x^2 y^4 - 3x^2y^2 + 1.

Thus the Newton polytope of g has corners (4,2), (2,4), (0,0) and looks like this

Thus if the Motzkin polynomial was a sum of squares g = \sum_{i=1}^n h_i^2, then the Newton polytope of each h_i must be contained in \frac{1}{2}N(g). Now \frac{1}{2}N(g) looks like this

The only integer points contained in \frac{1}{2}N(g) are (0,0), (1,1), (1,2) and (2,1). Thus each h_i must be equal to h_i(x,y) = a_i x^2 y+b_i xy^2 + c_i xy + d_i. If we square this polynomial we see that the coefficient of x^2 y^2 is c_i^2. Thus the coefficient of x^2y^2 in \sum_{i=1}^n h_i^2 must be c_1^2+c_2^2+\ldots +c_n^2 \ge 0. But in the Motzkin polynomial, x^2y^2 has coefficient -3.

Thus the Motzkin polynomial is a concrete example of a non-negative polynomial that is not a sum of squares.


As stated in the first part of this series, these two blog posts are based off conversations I had with Abhishek Bhardwaj last year. I also used these slides to remind myself why the Motzkin polynomial is not a sum of squares.

Sums of Squares – Part 1: Optimization

About a year ago I had the pleasure of having a number of conversations with Abhishek Bhardwaj about the area of mathematics his PhD is on. This pair of posts is based on the fond memories I have of these conversations. Part two can be found here.

Optimization and Summing Squares

Optimization is a huge area of both pure and applied mathematics. Many questions and problems can be reduced to finding the minimum or maximum value of some function. That is we have a function f : \mathbb{R}^n \to \mathbb{R} and we wish to find the number \lambda such that

\lambda = \min \{ f(x_1,\ldots, x_n) \mid (x_1,\ldots, x_n) \in \mathbb{R}^n \},


\lambda = \max \{ f(x_1,\ldots, x_n) \mid (x_1,\ldots, x_n) \in \mathbb{R}^n \}.

By considering the function -f we can reduce finding the maximum of f to finding the minimum of another function. Minimizing a function isn’t always easy. Even when the function f is a relative simple function such as a polynomial, it can be a very difficult problem.

A different way of thinking about the problem of minimizing the function f, is to instead think about the function f - \lambda were \lambda is a real number. The minimum of f is the largest value of \lambda such that the function f - \lambda is non-negative for all values x \in \mathbb{R}^n. Thus we have a reduced our optimization problem into the problem of working out for which values of \lambda is the function f-\lambda non-negative.

An example

Suppose our function f was the following function that takes in two variables

f(x,y) = 2x^2 + 4y^2-4xy-4x+4y+7.

To minimize this function we can look at the function g = f - \lambda which is equal to

g(x,y) = 2x^2 + 4y^2-4xy-4x+4y+7 - \lambda.

We wish to work for which values of \lambda is this function g non-negative. By doing the following algebraic manipulations we can rewrite g like so

g(x,y) = (x^2-2x+1)+(x^2+4y^2+1-4xy-2x+4y)+5-\lambda ,

which is in turn equal to

g(x,y) = (x-1)^2+(-x+2y+1)^2 +(5-\lambda)

Since (x-1)^2 and (-x+2y+1)^2 are both squares of real numbers, they are both non-negative. Furthermore at the point (x,y) = (1,0), we have that (x-1)^2+(-x+2y+1)^2=0. Thus g is non-negative if and only if (5-\lambda) \ge 0, that is if and only if \lambda \le 5. Thus we can conclude that the minimum of f is 5.

Sums of Squares

Note that the number 5-\lambda can be written as (\sqrt{5-\lambda})^2 if and only if \lambda \ge 5. Thus, in the above example we have that g is non-negative if and only if g can be written as a sum of squares. That is g is non-negative if and only if g can be written as \sum_{i=1}^n h_i^2 for some polynomials h_1, h_2, \ldots, h_n.

In general, if a polynomial can be written as a sum of squares of other polynomials, then the polynomial must be non-negative. A natural question to ask is if every non-negative polynomial can be written as a sum of squares. This would make our optimization problem a lot easier. To minimize f, we can check for which values of \lambda can f-\lambda be written as a sum of squares.

This question of whether every non-negative polynomial can be written as a sum of squares was answered in the negative by David Hilbert in 1888. However, Hilbert’s proof was non-constructive, it didn’t give an explicit example of such a polynomial. The proof only showed that such polynomials exist out there somewhere. It took nearly nearly 80 years for the first explicit example to be given by the mathematician Theodore Motzkin in 1967. We will take a look at Motzkin’s polynomial in the next blog post.

The Outer Automorphism of S6

This blog post is about why the number 6 is my favourite number – because the symmetric group on six elements is the only finite permutation group that contains an outer automorphism. In the post we will first look at why there are no other symmetric groups with outer automorphisms and then we will define an outer automorphism of S_6.

Inner and Outer Automorphisms

If G is any group, then we can construct a new group \text{Aut}(G) consisting of all bijective group homomorphism \phi : G \to G. This set forms a group under function composition. There is a natural map from G to \text{Aut}(G). A group element g \in G is taken to the function \gamma_g : G \to G given by \gamma_g(h) = ghg^{-1}. Thus \gamma_g is conjugation by g.

The kernel of g \mapsto \gamma_g is called the center of G and is denoted by Z(G). That is Z(G) is the subgroup consisting of group elements that commute with every other group element. The image of this map is denoted by \text{Inn}(G) \subseteq \text{Aut}(G). An automorphism which is in \text{Inn}(G) is called an inner automorphism. One can check that the subgroup of inner automorphisms is a normal subgroup. Indeed if we have a group element g \in G and an automorphism \phi \in \text{Aut}(G), then \phi \circ \gamma_g \circ \phi^{-1} = \gamma_{\phi(g)}. Thus we can form the quotient \text{Aut}(G)/\text{Inn}(G) which we will call the outer automorphisms of G and denote by \text{Out}(G).

Thus the inner automorphisms of G are those which come from G in the form of conjugation. The outer automorphisms of G are equivalence classes of all the automorphisms of G. Two automorphisms \phi, \phi' are equivalent in \text{Out}(G) if there is an inner automorphism \gamma_g such that \gamma_g \circ \phi = \phi'.

The Symmetric Group

For a natural number n \in \mathbb{N}, S_n will denote the group of permutations of the set \{1,2,\ldots,n\}. Any permutation \sigma \in S_n can be written as product of disjoint cycles which we will call its cycle type. For instance, if n=5 and \sigma(1)=3, \sigma(2)=5, \sigma(3)=4, \sigma(4)=1 and \sigma(5)=2, then \sigma can be written in cycle notation as (1,3,4)(2,5). Cycle notation is very useful for describing conjugation in S_n. If we have two permutations \sigma, \tau \in S_n and we know the cycle notation for \sigma, then the cycle notation for \tau \sigma \tau^{-1} can be derived by applying \tau to every entry in the cycle notation of \sigma. For instance, with \sigma = (1,3,4)(2,5), we have

\tau(1,3,4)(2,5)\tau^{-1} = (\tau(1),\tau(3),\tau(4))(\tau(2),\tau(5)).

To see why this result about conjugation is true, note the following. If \sigma(i)=j for some i,j \in \{1,2,\ldots,n\}, then


Thus two permutations \sigma, \sigma' are conjugate to one another if and only if \sigma and \sigma' have the same cycle type (ie they have the same number of cycles of any given size). This implies that for n \ge 3, the centre of S_n must be trivial as for any non-identity \sigma \in S_n, there is at least one other element of the same cycle type. Hence the map \sigma \mapsto \gamma_\sigma from S_n to \text{Aut}(S_n) is injective for n \ge 3.

Counting Conjugacy Classes

Before constructing an outer automorphism for S_6, we will see why such a map is special. In particular we will learn that \text{Aut}(S_n) consists entirely of inner automorphism for n \neq 6. The argument is based off studying the size of different conjugacy classes in S_n. This is important because if \phi is an automorphism of S_n and \sigma,\sigma' \in S_n are two permutations, then \sigma is conjugate to \sigma' if and only if \phi(\sigma) is conjugate to \phi(\sigma').

In S_n we know that two elements are conjugate if and only if they have the same cycle type. We can use this to count the size of different conjugacy classes of S_n. In particular the number of permutations with n_k cycles of size k for k = 1,2,\ldots, n is

\displaystyle \frac{\displaystyle n!}{\displaystyle \prod\limits_{k=1}^n n_k! k^{n_k}}.

The transpositions (j,k) for 1 \le j < k\le n generate S_n and form a conjugacy class which we call T. For any automorphism \phi \in \text{Aut}(S_n) , the set \phi(T) must be a conjugacy class that is also of size | T | = n(n-1)/2. Since automorphism preserve order, the elements of \phi(T) must all be of order two. Thus the cycle type of the conjugacy class \phi(T) must contain only cycles of size two or size one. Also there must be an odd number of two cycles. If there was an even number of two cycles, then we would have that \phi(T) is contained in the subgroup A_n \subseteq S_n. However this would imply that \phi(S_n) \subseteq \phi(A_n) which contradicts the fact that \phi : S_n \rightarrow S_n is a bijection. Thus the following lemma implies that \phi(T)=T when n \neq 6.

Lemma: Fix n \neq 6 and an odd number k such that k > 1 and 2k \le n. Let m_k denote the number of permutations \sigma \in S_n which have k cycles of size two and n-2k cycles of size one, then m_k \neq n(n-1)/2.

Proof: This claim can be manually checked for n < 6. For n > 6, assume in order to get a contradiction that m_k = \frac{n(n-1)}{2}. Note that m_k is equal to \frac{n!}{k! 2^k(n-2k)! 1^{n-2k}}=\frac{n!}{k! (n-2k)! 2^k}. Thus we have \frac{n(n-1)}{2} = \frac{n!}{k!(n-2k)!2^k} and hence

2^{k-1} = \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!}.

Now if k \ge 5, then (n-4)(n-5)\ldots(n-2k+1) is a product of at least k consecutive integers and hence divisible by k! (see here). Thus

2^{k-1} = \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} = (n-2)(n-3)\frac{(n-4)(n-5)\ldots(n-2k+1)}{k!}

and hence \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} is divisible by an odd number other than one (either n-2 or n-3). Thus \frac{(n-2)(n-3)\ldots(n-2k+1)}{k!} cannot be a power of 2, contradicting the above equation.

When k = 3, the above equation becomes:

4= \frac{(n-2)(n-3)(n-4)(n-5)}{3!} = (n-2)\frac{(n-3)(n-4)(n-5)}{3!} = \frac{(n-2)(n-3)(n-4)}{3!}(n-5).

Thus if n is odd, n-2 is an odd number other than 1 that divides 4, a contradiction. If n is even, then since n > 6, n-5 is an odd number other than 1 that divides 4, which is again a contradiction. Thus we cannot have any case when m_k = \frac{n(n-1)}{2}.

S_n does not have an outer automorphism if n \neq 6

With the above lemma we can now prove that all the automorphism of S_n for n \neq 6 are inner. First note that the only automorphism of S_2 = \{ \text{id}, (12)\} is the identity which is an inner automorphism. When n \ge 3, the map \sigma \mapsto \gamma_\sigma from S_n to \text{Aut}(S_n) is injective. Thus there are exactly n! inner automorphisms of S_n. We will show that for n \neq 6 there are exactly n! automorphisms. Hence there can be no outer automorphisms.

Let \phi be an automorphism of S_n. The symmetric group S_n is also generated by the n-1 transpositions \sigma_i = (i,i+1) (for i =1, \ldots, n-1). By the previous lemma we know that if n \neq 6, then each \phi(\sigma_i) must be a transposition (j_i,k_i) for some j_i, k_i \in \{1,\ldots, n\} such that j_i \neq k_i. As the collection \sigma_1,\ldots,\sigma_{n-1} generates S_n, the automorphism \phi is determined by the transpositions \phi(\sigma_i)= (j_i,k_i). We will now show that there are exactly n! choices for these transpositions.

Since \sigma_1\sigma_{2} = (1,2,3), we know that \phi(\sigma_1)\phi(\sigma_2) = (j_1,k_1)(j_2,k_2) must be a three cycle. Thus one of j_2,k_2 must be equal to one of j_1,k_1 and the other must be distinct. By relabeling if necessary, we will require that k_1=j_2 and j_1 \neq k_2. Thus there are n choices for j_1, n-1 choices for k_1=j_2 and n-2 choices for k_2. This gives us n(n-1)(n-2) choices for the first two transpositions. Continuing in this manner, we note that \sigma_1\sigma_3 = (1,2)(3,4) and \sigma_2\sigma_3 = (2,3,4), we get that one of j_3,k_3 is equal to one of k_2 and the other one is distinct from j_1,k_1=j_2,k_2, thus giving us n-3 choices for the third transposition. In general we will have that \sigma_{i-1}\sigma_i is a three cycle and \sigma_l\sigma_i is a pair of disjoint two cycles if l < i-1. Thus one can show inductively that there are n-i choices for the transposition \phi(\sigma_i). Hence there are n(n-1)(n-2)\ldots 1 = n! choices for all the transpositions \phi(\sigma_i) and hence n! choices for \phi. Thus, if n \neq 6, all automorphisms of S_n are inner.

The outer automorphism of S_6.

The key part of showing that S_n didn’t have any outer automorphisms when n \neq 6 was showing that any automorphism must map transpositions to transpositions. Thus to find the outer automorphism of S_6 we will want to find an automorphism that does not preserve cycle type. In particular we will find one which maps transpositions to the cycle type (a,b)(c,d)(e,f). There are \frac{6!}{3!2^3} = 15 such triple transpositions which is exactly \frac{6\cdot 5}{2} – the number of single transpositions.

The outer automorphism, \psi, will be determined by the values it takes on \sigma_1,\sigma_2,\sigma_3,\sigma_4 and \sigma_5 (where again \sigma_i = (i,i+1)). Each of these transpositions will get to a different triple transposition (a,b)(c,d)(e,f). To extend \psi to the rest of S_6 it suffices to check the Coxeter relations \psi(\sigma_k)^2 = (\psi(\sigma_i)\psi(\sigma_{i+1}))^3 = (\psi(\sigma_i)\psi(\sigma_j))^2=\text{id} for k = 1,2,3,4,5, i =1,2,3,4 and j \ge i+2.

The relation \psi(\sigma_k)^2=\text{id} will hold for any triple transposition \psi(\sigma_k). The other relations are a bit trickier. If we have two triple transpositions \tau = (a,b)(c,d)(e,f) and \tau' = (a',b')(c',d')(e',f'), then \tau\tau' will be a product of two three cycles if none of the transpositions of \tau is a transposition of \tau'. If \tau and \tau' share exactly one transposition, then \tau\tau' is a product of two two cycles. If \tau and \tau' share two or more transpositions they must actually be equal and hence \tau\tau' is the identity.

With this in mind, one can verify that following assignment extends to a valid group homomorphism from S_6 to S_6:

\psi(\sigma_1) = (1,2)(3,4)(5,6),
\psi(\sigma_2) = (1,3)(2,5)(4,6),
\psi(\sigma_3) = (1,5)(2,6)(3,4),
\psi(\sigma_4) = (1,3)(2,4)(5,6),
\psi(\sigma_5) = (1,6)(2,5)(3,4).

We now just need to justify that the group homomorphism \psi is an automorphism. Since S_6 is a finite group, \psi is a bijection if and only if \psi is an injection. Recall that only normal subgroups of S_6 are the trivial subgroup, A_6 and S_6. Thus to prove injectivity of \psi, it suffices to show that \psi(\tau) \neq \text{id} for some \tau \in A_6. Indeed \psi(\sigma_1\sigma_3) = (1,6)(2,5) \neq \text{id} as required. Thus we have our outer automorphism!

This isn’t the most satisfying way of defining the outer automorphism as it requires using the generators of the symmetric group. There are many more intrinsic ways to define this outer automorphism which will hopefully be the topic of a later blog post!


The proof I give that for n \neq 6, the group S_n has no outer automorphisms is based off the one given in this paper On the Completeness of Symmetric Group.

Combing groups

Two weeks ago I gave talk titled “Two Combings of \mathbb{Z}^2“. The talk was about some material I have been discussing lately with my honours supervisor. The talk went well and I thought it would be worth sharing a written version of what I said.

Geometric Group Theory

Combings are a tool that gets used in a branch of mathematics called geometric group theory. Geometric group theory is a relatively new area of mathematics and is only around 30 years old. The main idea behind geometric group theory is to use tools and ideas from geometry and low dimensional topology to study and understand groups. It turns out that some of the simplest questions one can ask about groups have interesting geometric answers. For instance, the Dehn function of a group gives a natural geometric answer to the solvability of the word problem.


Before we can define what a combing is we’ll need to set up some notation. If A is a set then we will write A^* for the set of words written using elements of A and inverses of elements of A. For instance if A = \{a,b\}, then A^* = \{\varepsilon, a,b,a^{-1},b^{-1},aa=a^2,abb^{-1}a^{-3},\ldots\} (here \varepsilon refers to the empty word). If w is a word in A^*, we will write l(w) for the length of w. Thus l(\varepsilon)=0, l(a)=1, l(abb^{-1}a^{-3})=6 and so on.

If G is a group and A is a subset of G, then we have a natural map \pi : A^* \rightarrow G given by:

\pi(a_1^{\pm 1}a_2^{\pm 1}\ldots a_n^{\pm 1}) = a_1^{\pm 1}\cdot a_2^{\pm 1} \cdot \ldots \cdot a_n^{\pm 1}.

We will say that A generates G if the above map is surjective. In this case we will write \overline{w} for \pi(w) when w is a word in A^*.

The Word Metric

The geometry in “geometric group theory” often arises when studying how a group acts on different geometric spaces. A group always acts on itself by left multiplication. The following definition adds a geometric aspect to this action. If G is a group with generators A, then the word metric on G with respect to A is the function d_G : G \times G \rightarrow \mathbb{R} given by

d_G(g,h) = \min \{ l(w) \mid w \in A^*, \overline{w} = g^{-1}h \}.

That, is the distance between two group elements g,h \in G is the length of the shortest word in A^* we can use to represent g^{-1}h. Equivalently the distance between g and h is the length of the shortest word we have to append to g to produce h. This metric is invariant under left-multiplication by G (ie d_G(g\cdot h,g\cdot h') =d_G(h,h') for all g,h,h' \in G). Thus G acts on (G,d_G) by isometries.

Words are Paths

Now that we are viewing the group G as a geometric space, we can also change how we think of words w \in A^*. Such a word can be thought of as discrete path in G. That is we can think of w as a function from \mathbb{N} to G. This way of thinking of w as a discrete path is best illuminated with an example. Suppose we have the word w = ab^2a^{-1}b, then

w(0) = e,
w(1) = \overline{a}
w(2) = \overline{ab}
w(3) = \overline{ab^2}
w(4) = \overline{ab^2a^{-1}}
w(5) = \overline{ab^2a^{-1}b}
w(t) = \overline{ab^2a^{-1}b}, t \ge 5.

Thus the path w : \mathbb{N} \rightarrow G is given by taking the first t letters of w and mapping this word to the group element it represents. With this interpretation of word in A^* in mind we can now define combings.


Let G be a group with a finite set of generators A. Then a combing of G with respect to A is a function \sigma : G \rightarrow A^* such that

  1. For all g \in G, \overline{\sigma_g} = g (we will write \sigma_g for \sigma(g)).
  2. There exists k >0 such that for all g, h \in G with g \cdot \overline{a} = h for some a \in A, we have that d_G(\sigma_g(t),\sigma_h(t)) \le k for all t \in \mathbb{N}.

The first condition says that we can think of \sigma as a way of picking a normal form \sigma_g \in A^* for each g \in G. The second condition is a bit more involved. It states that if the group elements g, h \in G are distance 1 from each other in the word metric, then the paths \sigma_g,\sigma_h  :  \mathbb{N} \rightarrow G are within distance k of each other at any point in time.

An Example

Not all groups can be given a combing. Indeed if we have a combing of G, then the word problem in G is solvable and the Dehn function of G is at most exponential. One group that does admit a combing is \mathbb{Z}^2 = \{(m,n) \mid m,n \in \mathbb{Z}\}. This group is generated by A = \{(1,0),(0,1)\} = \{\beta,\gamma\} and one combing of \mathbb{Z}^2 with respect to this generating set is

\sigma_{(m,n)} = \beta^m\gamma^n.

The first condition of being a combing is clearly satisfied and the following picture shows that the second condition can be satisfied with k = 2.

A Non-Example

The discrete Heisenberg group, H_3, can be given by the following presentation

H_3 = \langle \alpha,\beta,\gamma \mid \alpha\gamma = \gamma\alpha, \beta\gamma = \gamma\beta, \beta\alpha = \alpha\beta\gamma \rangle.

That is, the group H_3 has three generators \alpha,\beta and \gamma. The generator \gamma commutes with both \alpha and \beta. The generators \alpha and \beta almost commute but don’t quite as seen in the relation \beta\alpha = \alpha\beta\gamma.

Any h \in H_3 can be represented uniquely as \sigma_h = \alpha^p\beta^m\gamma^n for p,m,n \in \mathbb{Z}. To see why such a representation exists it’s best to consider an example. Suppose that h = \overline{\gamma\beta\alpha\gamma\alpha\beta\gamma}. Then we can use the fact that \gamma commutes with \alpha and \beta to push all \gamma‘s to the right and we get that h = \overline{\beta\alpha\alpha\beta\gamma^3}. We can then apply the third relation to switch the order of \alpha and \beta on the right. This gives us that that h = \overline{\alpha\beta\gamma\alpha\beta\gamma^3}=\overline{\alpha\beta\alpha\beta\gamma^4}. If we apply this relation once more we get that h = \overline{\alpha^2\beta^2\gamma^5} and thus \sigma_h = \alpha^2\beta^2\gamma^5. The procedure used to write h in the form \alpha^p\beta^m\gamma^n can be generalized to any word written using \alpha^{\pm 1}, \beta^{\pm 1}, \gamma^{\pm 1}.

The fact the such a representation is unique (that is if \overline{\alpha^p\beta^m\gamma^n} = \overline{\alpha^{p'}\beta^{m'}\gamma^{n'}}, then (p,m,n) = (p',m',n')) is harder to justify but can be proved by defining an action of H_3 on \mathbb{Z}^3. Thus we can define a function \sigma : H_3 \rightarrow \{\alpha,\beta,\gamma\}^* by setting \sigma_h to be the unique word of the form \alpha^p \beta^m\gamma^n that represents h. This map satisfies the first condition of being a combing and has many nice properties. These include that it is easy to check whether or not a word in \{\alpha,\beta,\gamma\}^* is equal to \sigma_h for some h \in H_3 and there are fast algorithms for putting a word in \{\alpha,\beta,\gamma\}^* into its normal form. Unfortunately this map fails to be a combing.

The reason why \sigma : H_3 \rightarrow \{\alpha,\beta,\gamma\}^* fails to be a combing can be seen back when we turned \overline{  \gamma\beta\alpha\gamma\alpha\beta\gamma } into \overline{\alpha^2\beta^2\gamma^5} . To move \alpha‘s on the right to the left we had to move past \beta‘s and produce \gamma‘s in the process. More concretely fix m \in \mathbb{N} and let h = \overline{\beta^m} and g = \overline{\beta^m \alpha} = \overline{\alpha \beta^m\gamma^m}. We have \sigma_h = \beta^m and \sigma_g = \alpha \beta^m \gamma^m. The group elements g and h differ by a generator. Thus, if \sigma was a combing we should be able to uniformly bound d_{H_3}(\sigma_g(t),\sigma_h(t)) for all t \in \mathbb{N} and all m \in \mathbb{N}.

If we then let t = m+1, we can recall that

d_{H_3}(\sigma_g(t),\sigma_h(t)) = \min\{l(w) \mid w \in \{\alpha,\beta,\gamma\}^*, \overline{w} = \sigma_g(t)^{-1}\sigma_h(t)\}.

We have that \sigma_h(t) = \overline{\beta^m} and \sigma_g(t) = \overline{\alpha \beta^m} and thus

\sigma_g(t)^{-1}\sigma_h(t)  = (\overline{\alpha\beta^m})^{-1}\overline{\beta^m} = \overline{\beta^{-m}\alpha^{-1}\beta^m} = \overline{\alpha^{-1}\beta^{-m}\gamma^{m}\beta^m}=\overline{\alpha^{-1}\gamma^{m}}.

The group element \overline{\alpha^{-1}\gamma^{m}} cannot be represented as a shorter element of \{\alpha,\beta,\gamma\}^* and thus d_{H_3}(\sigma_g(t),\sigma_h(t)) = m+1 and the map \sigma is not a combing.

Can we comb the Heisenberg group?

This leaves us with a question, can we comb the group H_3? It turns out that we can but the answer actually lies in finding a better combing of \mathbb{Z}^2. This is because H_3 contains the subgroup \mathbb{Z}^2 \cong \langle \beta, \gamma \rangle \subseteq H_3. Rather than using the normal form \sigma_h = \alpha^p \beta^m \gamma^n, we will use \sigma'_h = \alpha^p \tau_{(m,n)} where \tau : \mathbb{Z}^2 \rightarrow \{\beta,\gamma \}^* is a combing of \mathbb{Z}^2 that is more symmetric. The word \tau_{(m,n)} is defined to be the sequence of m \beta‘s and n \gamma‘s that stays closest to the straight line in \mathbb{R}^2 that joins (0,0) to (n,p) (when we view \beta and \gamma as representing (1,0) and (0,1) respectively). Below is an illustration:

This new function isn’t quite a combing of H_3 but it is the next best thing! It is an asynchronous combing. An asynchronous combing is one where we again require that the paths \sigma_h,\sigma_g stay close to each other whenever h and g are close to each other. However we allow the paths \sigma_h and \sigma_g to travel at different speeds. Many of the results that can be proved for combable groups extend to asynchronously combable groups.


Hairdressing in Groups by Sarah Rees is a survey paper that includes lots examples of groups that do or do not admit combings. It also talks about the language complexity of a combing, something I didn’t have time to touch on in my talk.

Combings of Semidirect Products and 3-Manifold Groups by Martin Bridson contains a proof that H_3 is asynchornously combable. He actually proves the more general result that any group of the form \mathbb{Z}^n \rtimes \mathbb{Z}^m is asynchronously combable.

Thank you to my supervisor, Tony Licata, for suggesting I give my talk on combing H_3 and for all the support he has given me so far.