# Extremal couplings

This post is inspired by an assignment question I had to answer for STATS 310A – a probability course at Stanford for first year students in the statistics PhD program. In the question we had to derive a few results about couplings. I found myself thinking and talking about the question long after submitting the assignment and decided to put my thoughts on paper. I would like to thank our lecturer Prof. Diaconis for answering my questions and pointing me in the right direction.

## What are couplings?

Given two distribution functions $F$ and $G$ on $\mathbb{R}$, a coupling of $F$ and $G$ is a distribution function $H$ on $\mathbb{R}^2$ such that the marginals of $H$ are $F$ and $G$. Couplings can be used to give probabilistic proofs of analytic statements about $F$ and $G$ (see here). Couplings are also are studied in their own right in the theory optimal transport.

We can think of $F$ and $G$ as being the cumulative distribution functions of some random variables $X$ and $Y$. A coupling $H$ of $F$ and $G$ thus corresponds to a random vector $(\widetilde{X},\widetilde{Y})$ where $\widetilde{X}$ has the same distribution as $X$, $\widetilde{Y}$ has the same distribution as $Y$ and $(\widetilde{X},\widetilde{Y}) \sim H$.

## The independent coupling

For two given distributions function $F$ and $G$ there exist many possible couplings. For example we could take $H = H_I$ where $H_I(x,y) = F(x)G(y)$. This coupling corresponds to a random vector $(\widetilde{X}_I,\widetilde{Y}_I)$ where $\widetilde{X}_I$ and $\widetilde{Y}_I$ are independent and (as is required for all couplings) $\widetilde{X}_I \stackrel{\text{dist}}{=} X$, $\widetilde{Y}_I \stackrel{\text{dist}}{=} Y$.

In some sense the coupling $H_I$ is in the “middle” of all couplings. This is because $\widetilde{X}$ and $\widetilde{Y}$ are independent and so $\widetilde{X}$ doesn’t carry any information about $\widetilde{Y}$. As the title of the post suggests, there are couplings were this isn’t the case and $\widetilde{X}$ carries “as much information as possible” about $\widetilde{Y}$.

## The two extremal couplings

Define two function $H_L, H_U :\mathbb{R}^2 \to [0,1]$ by

$H_U(x,y) = \min\{F(x), G(y)\}$ and $H_L(x,y) = \max\{F(x)+G(y) - 1, 0\}$.

With some work, one can show that $H_L$ and $H_U$ are distributions functions on $\mathbb{R}^2$ and that they have the correct marginals. In this post I would like to talk about how to construct random vectors $(\widetilde{X}_U, \widetilde{Y}_U) \sim H_U$ and $(\widetilde{X}_L, \widetilde{Y}_L) \sim H_L$.

Let $F^{-1}$ and $G^{-1}$ be the quantile functions of $F$ and $G$. That is,

$F^{-1}(c) = \inf\{ x \in \mathbb{R} : F(x) \ge c\}$ and $G^{-1}(c) = \inf\{ x \in \mathbb{R} : G(x) \ge c\}$.

Now let $V$ be a random variable that is uniformly distributed on $[0,1]$ and define

$\widetilde{X}_U = F^{-1}(V)$ and $\widetilde{Y}_U = G^{-1}(V)$.

Since $F^{-1}(V) \le x$ if and only if $V \le F(x)$, we have $\widetilde{X}_U \stackrel{\text{dist}}{=} X$ and likewise $\widetilde{Y}_U \stackrel{\text{dist}}{=} Y$. Furthermore $\widetilde{X}_U \le x, \widetilde{Y}_U \le y$ occurs if and only if $V \le F(x), V \le G(y)$ which is equivalent to $V \le \min\{F(x),G(y)\}$. Thus

$\mathbb{P}(\widetilde{X}_U \le x, \widetilde{Y}_U \le y) = \mathbb{P}(V \le \min\{F(x),G(y)\})= \min\{F(x),G(y)\}.$

Thus $(\widetilde{X}_U,\widetilde{Y}_U)$ is distributed according to $H_U$. We see that under the coupling $H_U$, $\widetilde{X}_U$ and $\widetilde{Y}_U$ are closely related as they are both increasing functions of a common random variable $V$.

We can follow a similar construction for $H_L$. Define

$\widetilde{X}_L = F^{-1}(V)$ and $\widetilde{Y}_L = G^{-1}(1-V)$.

Thus $\widetilde{X}_L$ and $\widetilde{Y}_L$ are again functions of a common random variable $V$ but $\widetilde{X}_L$ is an increasing function of $V$ and $\widetilde{Y}_L$ is a decreasing function of $V$. Note that $1-V$ is also uniformly distributed on $[0,1]$. Thus $\widetilde{X}_L \stackrel{\text{dist}}{=} X$ and $\widetilde{Y}_L \stackrel{\text{dist}}{=} Y$.

Now $\widetilde{X}_L \le x, \widetilde{Y}_L \le y$ occurs if and only if $V \le F(x)$ and $1-V \le G(y)$ which occurs if and only if $1-G(y) \le V \le F(x)$. If $1-G(y) \le F(x)$, then $F(x)+G(y)-1 \ge 0$ and $\mathbb{P}(1-G(y) \le V \le F(x)) =F(x)+G(y)-1$. On the other hand, if $1 - G(y) > F(x)$, then $F(x)+G(y)-1< 0$ and $\mathbb{P}(1-G(y) \le V \le F(x))=0$. Thus

$\mathbb{P}(\widetilde{X}_L \le x, \widetilde{Y}_L \le y) = \mathbb{P}(1-G(y) \le V \le F(x)) = \max\{F(x)+G(y)-1,0\}$,

and so $(\widetilde{X}_L,\widetilde{Y}_L)$ is distributed according to $H_L$.

## What makes $H_U$ and $H_L$ extreme?

Now that we know that $H_U$ and $H_L$ are indeed couplings, it is natural to ask what makes them “extreme”. What we would like to say is that $\widetilde{Y}_U$ is an increasing function of $\widetilde{X}_U$ and $\widetilde{Y}_L$ is a decreasing function of $\widetilde{X}_L$. Unfortunately this isn’t always the case as can be seen by taking $X$ to be constant and $Y$ to be continuous.

However the intuition that $\widetilde{Y}_U$ is increasing in $\widetilde{X}_U$ and $\widetilde{Y}_L$ is decreasing in $\widetilde{X}_L$ is close to correct. Given a coupling $(\widetilde{X},\widetilde{Y}) \sim H$, we can look at the quantity

$C(x,y) = \mathbb{P}(\widetilde{Y} \le y | \widetilde{X} \le x) -\mathbb{P}(\widetilde{Y} \le y) = \frac{H(x,y)}{F(x)}-G(y)$

This quantity tells us something about how $\widetilde{Y}$ changes with $\widetilde{X}$. For instance if $\widetilde{X}$ and $\widetilde{Y}$ were positively correlated, then $C(x,y)$ would be positive and if $\widetilde{X}$ and $\widetilde{Y}$ were negatively correlated, then $C(x,y)$ would be negative.

For the independent coupling $(\widetilde{X}_I,\widetilde{Y}_I) \sim H_I$, the quantity $C(x,y)$ is constantly $0$. It turns out that the above probability is maximised by the coupling $(\widetilde{X}_U, \widetilde{Y}_U) \sim H_U$ and minimised by $(\widetilde{X}_L,\widetilde{Y}_L) \sim H_L$ and it is in this sense that they are extremal. This final claim is the two dimensional version of the Fréchet-Hoeffding Theorem and checking it is a good exercise.