Non-measurable sets, cardinality and the axiom of choice

The following post is based on a talk I gave at the 2022 Stanford statistics retreat. The talk was titled “Another non-measurable monster”.

The opening slide from my presentation. The test reads “another non-measurable monster”. There are spooky pumpkins in the background. Image from

The material was based on the discussion and references given in this stackexchange post. The title is a reference to a Halloween lecture on measurability given by Professor Persi Diaconis.

What’s scarier than a non-measurable set?

Making every set measurable. Or rather one particular consequence of making every set measurable.

In my talk, I argued that if you make every set measurable, then there exists a set \Omega and an equivalence relation \sim on \Omega such that |\Omega| < |\Omega / \sim|. That is, the set \Omega has strictly smaller cardinality than the set of equivalence classes \Omega/\sim. The contradictory nature of this statement is illustrated in the picture below

We can think of the set \Omega as the collection of crosses drawn above. The equivalence relation \sim divides \Omega into the regions drawn above. The statement |\Omega|<|\Omega /\sim| means that in some sense there are more regions than crosses.

To make sense of this we’ll first have to be a bit more precise about what we mean by cardinality.

What do we mean by bigger and smaller?

Let A and B be two sets. We say that A and B have the same cardinality and write |A| = |B| if there exists a bijection function f:A \to B. We can think of the function f as a way of pairing each element of A with a unique element of B such that every element of B is paired with an element of A.

We next want to define |A|\le |B| which means A has cardinality at most the cardinality of B. There are two reasonable ways in which we could try to define this relationship

  1. We could say |A|\le |B| means that there exists an injective function f : A \to B.
  2. Alternatively, we could |A|\le |B| means that there exists a surjective function g:B \to A.

Definitions 1 and 2 say similar things and, in the presence of the axiom of choice, they are equivalent. Since we are going to be making every set measurable in this talk, we won’t be assuming the axiom of choice. Definitions 1 and 2 are thus no longer equivalent and we have a decision to make. We will use definition . in this talk. For justification, note that definition 1 implies that there exists a subset B' \subseteq B such that |A|=|B|. We simply take B' to be the range of f. This is a desirable property of the relation |A|\le |B| and it’s not clear how this could be done using definition 2.

Infinite binary sequences

It’s time to introduce the set \Omega and the equivalence relation we will be working with. The set \Omega is the set \{0,1\}^\mathbb{Z} the set of all function \omega : \mathbb{Z} \to \{0,1\}. We can think of each elements \omega \in \Omega as an infinite sequence of zeros and ones stretching off in both directions. For example

\omega = \ldots 1110110100111\ldots.

But this analogy hides something important. Each \omega \in \Omega has a “middle” which is the point \omega_0. For instance, the two sequences below look the same but when we make \omega_0 bold we see that they are different.

\omega = \ldots 111011\mathbf{0}100111\ldots,

\omega' = \ldots 1110110\mathbf{1}00111\ldots.

The equivalence relation \sim on \Omega can be thought of as forgetting the location \omega_0. More formally we have \omega \sim \omega' if and only if there exists n \in \mathbb{Z} such that \omega_{n+k} = \omega_{k}' for all k \in \mathbb{Z}. That is, if we shift the sequence \omega by n we get the sequence \omega'. We will use [\omega] to denote the equivalence class of \omega and \Omega/\sim for the set of all equivalences classes.

Some probability

Associated with the space \Omega are functions X_k : \Omega \to \{0,1\}, one for each integer k \in \mathbb{Z}. These functions simply evaluate \omega at k. That is X_k(\omega)=\omega_k. A probabilist or statistician would think of X_k as reporting the result of one of infinitely many independent coin tosses. Normally to make this formal we would have to first define a \sigma-algebra on \Omega and then define a probability on this \sigma-algebra. Today we’re working in a world where every set is measurable and so don’t have to worry about \sigma-algebras. Indeed we have the following result:

(Solovay, 1970)1 There exists a model of the Zermelo Fraenkel axioms of set theory such that there exists a probability \mathbb{P} defined on all subsets of \Omega such that X_k are i.i.d. \mathrm{Bernoulli}(0.5).

This result is saying that there is world in which, other than the axiom of choice, all the regular axioms of set theory holds. And in this world, we can assign a probability to every subset A \subseteq \Omega in a way so that the events \{X_k=1\} are all independent and have probability 0.5. It’s important to note that this is a true countably additive probability and we can apply all our familiar probability results to \mathbb{P}. We are now ready to state and prove the spooky result claimed at the start of this talk.

Proposition: Given the existence of such a probability \mathbb{P}, |\Omega | < |\Omega /\sim|.

Proof: Let f:\Omega/\sim \to \Omega be any function. To show that |\Omega|<|\Omega /\sim| we need to show that f is not injective. To do this, we’ll first define another function g:\Omega \to \Omega given by g(\omega)=f([\omega]). That is, g first maps \omega to \omega‘s equivalence class and then applies f to this equivalence class. This is illustrated below.

A commutative diagram showing the definition of g as g(\omega)=f([\omega]).

We will show that g : \Omega \to \Omega is almost surely constant with respect to \mathbb{P}. That is, there exists \omega^\star \in \Omega such that \mathbb{P}(g(\omega)=\omega^\star)=1. Each equivalence class [\omega] is finite or countable and thus has probability zero under \mathbb{P}. This means that if g is almost surely constant, then f cannot be injective and must map multiple (in fact infinitely many) equivalence classes to \omega^\star.

It thus remains to show that g:\Omega  \to \Omega is almost surely constant. To do this we will introduce a third function \varphi : \Omega \to \Omega. The map \varphi is simply the shift map and is given by \varphi(\omega)_k = \omega_{k+1}. Note that \omega and \varphi(\omega) are in the same equivalence class for every \omega\in \Omega. Thus, the map g satisfies g\circ \varphi = g. That is g is \varphi-invariant.

The map \varphi is ergodic. This means that if A \subseteq \Omega satisfies \varphi(A)=A, then \mathbb{P}(A) equals 0 or 1. For example if A is the event that 10110 appears at some point in \omega, then \varphi(A)=A and \mathbb{P}(A)=`1. Likewise if A is the event that the relative frequency of heads converges to a number strictly greater than 0.5, then \varphi(A)=A and \mathbb{P}(A)=0. The general claim that all \varphi-invariant events have probability 0 or 1 can be proved using the independence of X_k.

For each k, define an event A_k by A_k = \{\omega : g(\omega)_k = 1\}. Since g is \varphi-invariant we have that \varphi(A_k)=A_k. Thus, \mathbb{P}(A_k)=0 or 1. This gives us a function \omega^\star :\mathbb{Z} \to \{0,1\} given by \omega^\star_k = \mathbb{P}(A_k). Note that for every k, \mathbb{P}(\{\omega : g(\omega)_k = \omega_k^\star\}) = 1. This is because if w_{k}^\star=1, then \mathbb{P}(\{\omega: g(\omega)_k = 1\})=1, by definition of w_k^\star. Likewise if \omega_k^\star =0, then \mathbb{P}(\{\omega:g(\omega)_k=1\})=0 and hence \mathbb{P}(\{\omega:g(\omega)_k=0\})=1. Thus, in both cases, \mathbb{P}(\{\omega : g(\omega)_k = \omega_k^*\})= 1.

Since \mathbb{P} is a probability measure, we can conclude that

\mathbb{P}(\{\omega : g(\omega)=\omega^\star\}) = \mathbb{P}\left(\bigcap_{k \in \mathbb{Z}} \{\omega : g(\omega)_k = \omega_k^\star\}\right)=1.

Thus, g map \Omega to \omega^\star with probability one. Showing that g is almost surely constant and hence that f is not injective. \square

There’s a catch!

So we have proved that there cannot be an injective map f : \Omega/\sim \to \Omega. Does this mean we have proved |\Omega| < |\Omega/\sim|? Technically no. We have proved the negation of |\Omega/\sim|\le |\Omega| which does not imply |\Omega| \le |\Omega/\sim|. To argue that |\Omega| < |\Omega/\sim| we need to produce a map g: \Omega \to \Omega/\sim that is injective. Surprising this is possible and not too difficult. The idea is to find a map g : \Omega \to \Omega such that g(\omega)\sim g(\omega') implies that \omega = \omega'. This can be done by somehow encoding in g(\omega) where the centre of \omega is.

A simpler proof and other examples

Our proof was nice because we explicitly calculated the value \omega^\star where g sent almost all of \Omega. We could have been less explicit and simply noted that the function g:\Omega \to \Omega was measurable with respect to the invariant \sigma-algebra of \varphi and hence almost surely constant by the ergodicity of \varphi.

This quicker proof allows us to generalise our “spooky result” to other sets. Below are two examples where \Omega = [0,1)

  • Fix \theta \in [0,1)\setminus \mathbb{Q} and define \omega \sim \omega' if and only if \omega + n \theta= \omega' for some n \in \mathbb{Z}.
  • \omega \sim \omega' if and only if \omega - \omega' \in \mathbb{Q}.

A similar argument can be used to show that in Solovay’s world |\Omega| < |\Omega/\sim|. The exact same argument follows from the ergodicity of the corresponding actions on \Omega under the uniform measure.

Three takeaways

I hope you agree that this example is good fun and surprising. I’d like to end with some remarks.

  • The first remark is some mathematical context. This argument given today is linked to some interesting mathematics called descriptive set theory. This field studies the properties of well behaved subsets (such as Borel subsets) of topological spaces. Descriptive set theory incorporates logic, topology and ergodic theory. I don’t know much about the field but in Persi’s Halloween talk he said that one “monster” was that few people are interested in the subject.
  • The next remark is a better way to think about our “spooky result”. The result is really saying something about cardinality. When we no longer use the axiom of choice, cardinality becomes a subtle concept. The statement |A|\le |B| no longer corresponds to A being “smaller” than B but rather that A is “less complex” than B. This is perhaps analogous to some statistical models which may be “large” but do not overfit due to subtle constraints on the model complexity.
  • In light of the previous remark, I would invite you to think about whether the example I gave is truly spookier than non-measurable sets. It might seem to you that it is simply a reasonable consequence of removing the axiom of choice and restricting ourselves to functions we could actually write down or understand. I’ll let you decide


  1. Technically Solovay proved that there exists a model of set theory such that every subset of \mathbb{R} is Borel measurable. To get the result for binary sequences we have to restrict to [0,1) and use the binary expansion of x \in [0,1) to define a function [0,1) \to \Omega. Solvay’s paper is available here