A minimal counterexample in probability theory

Last semester I tutored the course Probability Modelling with Applications. In this course the main objects of study are probability spaces. A probability space is a triple (\Omega, \mathcal{F}, \mathbb{P}) where:

  1. \Omega is a set.
  2. \mathcal{F} is a \sigma-algebra on \Omega. That is \mathcal{F} is a collection of subsets of \Omega such that \Omega \in \mathcal{F} and \mathcal{F} is closed under set complements and countable unions. The element of \mathcal{F} are called events and they are precisely the subsets of \Omega that we can assign probabilities to. We will denote the power set of \Omega by 2^\Omega and hence \mathcal{F} \subseteq 2^\Omega.
  3. \mathbb{P} is a probability measure. That is it is a function \mathbb{P}: \mathcal{F} \rightarrow [0,1] such that \mathbb{P}(\Omega)=1 and for all countable collections \{A_i\}_{i=1}^\infty \subseteq \mathcal{F} of mutually disjoint subsets we have that \mathbb{P} \left(\bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty \mathbb{P}(A_i) .

It’s common for students to find probability spaces, and in particular \sigma-algebras, confusing. Unfortunately Vitalli showed that \sigma-algebras can’t be avoided if we want to study probability spaces such as \mathbb{R} or an infinite number of coin tosses. One of the main reasons why \sigma-algebras can be so confusing is that it can be very hard to give concrete descriptions of all the elements of a \sigma-algebra.

We often have a collection \mathcal{G} of subsets of \Omega that we are interested in but this collection fails to be a \sigma-algebra. For example, we might have \Omega = \mathbb{R}^n and \mathcal{G} is the collection of open subsets. In this situation we take our \sigma-algebra \mathcal{F} to be \sigma(\mathcal{G}) which is the smallest \sigma-algebra containing \mathcal{G}. That is

\sigma(\mathcal{G}) = \bigcap \mathcal{F}'

where the above intersection is taken over all \sigma-algebras \mathcal{F}' that contain \mathcal{G}. In this setting we will say that \mathcal{G} generates \sigma(\mathcal{G}). When we have such a collection of generators, we might have an idea for what probability we would like to assign to sets in \mathcal{G}. That is we have a function \mathbb{P}_0 : \mathcal{G} \rightarrow [0,1] and we want to extend this function to create a probability measure \mathbb{P} : \sigma(\mathcal{G}) \rightarrow [0,1]. A famous theorem due to Caratheodory shows that we can do this in many cases.

An interesting question is whether the extension \mathbb{P} is unique. That is does there exists a probability measure \mathbb{P}' on \sigma(\mathcal{G}) such that \mathbb{P} \neq \mathbb{P}' but \mathbb{P}_{\mid \mathcal{G}} = \mathbb{P}_{\mid \mathcal{G}}'? The following theorem gives a criterion that guarantees no such \mathbb{P}' exists.

Theorem: Let \Omega be a set and let \mathcal{G} be a collection of subsets of \Omega that is closed under finite intersections. Then if \mathbb{P},\mathbb{P}' : \sigma(\mathcal{G}) \rightarrow [0,1] are two probability measures such that \mathbb{P}_{\mid \mathcal{G}} = \mathbb{P}'_{\mid \mathcal{G}}, then \mathbb{P} = \mathbb{P}'.

The above theorem is very useful for two reasons. Firstly it can be combined with Caratheodory’s extension theorem to uniquely define probability measures on a \sigma-algebra by specifying the values on a collection of simple subsets \mathcal{G}. Secondly if we ever want to show that two probability measures are equal, the above theorem tells us we can reduce the problem to checking equality on the simpler subsets in \mathcal{G}.

The condition that \mathcal{G} must be closed under finite intersections is somewhat intuitive. Suppose we had A,B \in \mathcal{G} but A \cap B \notin \mathcal{G}. We will however have A \cap B \in \sigma(\mathcal{G}) and thus we might be able to find two probability measure \mathbb{P},\mathbb{P}' : \sigma(\mathcal{G}) \rightarrow [0,1] such that \mathbb{P}(A) = \mathbb{P}'(A) and \mathbb{P}'(B)=\mathbb{P}'(B) but \mathbb{P}(A \cap B) \neq \mathbb{P}'(A \cap B). The following counterexample shows that this intuition is indeed well-founded.

When looking for examples and counterexamples, it’s good to try to keep things as simple as possible. With that in mind we will try to find a counterexample when \Omega is finite set with as few elements as possible and \sigma(\mathcal{G}) is equal to the powerset of \Omega. In this setting, a probability measure \mathbb{P}: \sigma(\mathcal{G}) \rightarrow [0,1] can be defined by specifying the values \mathbb{P}(\{\omega\}) for each \omega \in \Omega.

We will now try to find a counterexample when \Omega is as small as possible. Unfortunately we won’t be able find a counterexample when \Omega only contains one or two elements. This is because we want to find A,B \subseteq \Omega such that A \cap B is not equal to A,B or \emptyset.

Thus we will start out search with a three element set \Omega = \{a,b,c\}. Up to relabelling the elements of \Omega, the only interesting choice we have for \mathcal{G} is \{  \{a,b\} , \{b,c\} \}. This has a chance of working since \mathcal{G} is not closed under intersection. However any probability measure \mathbb{P} on \sigma(\mathcal{G}) = 2^{\{a,b,c\}} must satisfy the equations

  1. \mathbb{P}(\{a\})+\mathbb{P}(\{b\})+\mathbb{P}(\{c\}) = 1,
  2. \mathbb{P}(\{a\})+\mathbb{P} (\{b\}) = \mathbb{P}(\{a,b\}),
  3. \mathbb{P}(\{b\})+\mathbb{P}(\{c\}) = \mathbb{P}(\{b,c\}).

Thus \mathbb{P}(\{a\}) = 1- \mathbb{P}(\{b,c\}), \mathbb{P}(\{c\}) = 1-\mathbb{P}(\{a,b\}) and \mathbb{P}(\{b\})=\mathbb{P}(\{a,b\})+\mathbb{P}(\{b,c\})-1. Thus \mathbb{P} is determined by its values on \{a,b\} and \{b,c\}.

However, a four element set \{a,b,c,d\} is sufficient for our counter example! We can let \mathcal{G} = \{\{a,b\},\{b,c\}\}. Then \sigma(\mathcal{G})=2^{\{a,b,c,d\}} and we can define \mathbb{P} , \mathbb{P}' : \sigma (\mathcal{G}) \rightarrow [0,1] by

  • \mathbb{P}(\{a\}) = 0, \mathbb{P}(\{b\})=0.5, \mathbb{P}(\{c\})=0 and \mathbb{P}(\{d\})=0.5.
  • \mathbb{P}'(\{a\})=0.5, \mathbb{P}(\{b\})=0, \mathbb{P}(\{c\})=0.5 and \mathbb{P}(\{d\})=0.

Clearly \mathbb{P} \neq \mathbb{P}' however \mathbb{P}(\{a,b\})=\mathbb{P}'(\{a,b\})=0.5 and \mathbb{P}(\{b,c\})=\mathbb{P}'(\{b,c\})=0.5. Thus we have our counterexample! In general for any \lambda \in [0,1) we can define the probability measure \mathbb{P}_\lambda = \lambda\mathbb{P}+(1-\lambda)\mathbb{P}'. The measure \mathbb{P}_\lambda is not equal to \mathbb{P} but agrees with \mathbb{P} on \mathcal{G}. In general, if we have two probability measures that agree on \mathcal{G} but not on \sigma(\mathcal{G}) then we can produce uncountably many such measures by taking convex combinations as done above.