# You could have proved the Neyman-Pearson lemma

The Neyman-Pearson lemma is foundational and important result in the theory of hypothesis testing. When presented in class the proof seemed magical and I had no idea where the ideas came from. I even drew a face like this 😲 next to the usual $\square$ in my book when the proof was finished. Later in class we learnt the method of undetermined multipliers and suddenly I saw where the Neyman-Pearson lemma came from.

In this blog post, I’ll first give some background and set up notation for the Neyman-Pearson lemma. Then I’ll talk about the method of undetermined multipliers and show how it can be used to derive and prove the Neyman-Pearson lemma. Finally, I’ll write about why I still think the Neyman-Pearson lemma is magical despite the demystified proof.

## Background

In the set up of the Neyman-Pearson lemma we have data $x$ which is a realisation of some random variable $X$. We wish to conclude something about the distribution of $X$ from our data $x$ by doing a hypothesis test.

In the Neyman-Pearson lemma we have simple hypotheses. That is our data either comes from the distribution $\mathbb{P}_0$ or from the distribution $\mathbb{P}_1$. Thus, our null hypothesis is $H_0 : X \sim \mathbb{P}_0$ and our alternative hypothesis is $H_1 : X \sim \mathbb{P}_1$.

A test of $H_0$ against $H_1$ is a function $\phi$ that takes in data $x$ and returns a number $\phi(x) \in [0,1]$. The value $\phi(x)$ is the probability under the test $\phi$ of rejecting $H_0$ given the observed data $x$. That is, if $\phi(x) = 1$, we always reject $H_0$ and if $\phi(x)=0$ we never reject $H_0$. For in-between values $\phi(x) = \gamma \in (0,1)$, we reject $H_0$ with probability $\gamma$.

An ideal test would have two desirable properties. We would like a test that rejects $H_0$ with a low probability when $H_0$ is true but we would also like the test to reject $H_0$ with a high probability when $H_1$ is true. To state this more formally, let $\mathbb{E}_0[\phi(X)]$ and $\mathbb{E}_1[\phi(X)]$ be the expectation of $\phi(X)$ under $H_0$ and $H_1$ respectively. The quantity $\mathbb{E}_0[\phi(X)]$ is the probability we reject $H_0$ when $H_0$ is true. Likewise, the quantity $\mathbb{E}_1[\phi(X)]$ is the probability we reject $H_0$ when $H_1$ is true. An optimal test would be one that minimises $\mathbb{E}_0[\phi(X)]$ and maximises $\mathbb{E}_1[\phi(X)]$.

Unfortunately the goals of minimising $\mathbb{E}_0[\phi(X)]$ and maximising $\mathbb{E}_1[\phi(X)]$ are at odds with one another. This is because we want $\phi$ to be small in order to minimise $\mathbb{E}_0[\phi(X)]$ and we want $\phi$ to be large to maximise $\mathbb{E}_1[\phi(X)]$. In nearly all cases we have to trade off between these two goals and there is no single test that simultaneously achieves both.

To work around this, a standard approach is to focus on maximising $\mathbb{E}_1[\phi(X)]$ while requiring that $\mathbb{E}_0[\phi(X)]$ remains below some threshold. The quantity $\mathbb{E}_1[\phi(X)]$ is called the power of the test $\phi$. If $\alpha$ is a number between $0$ and $1$, we will say that $\phi$ has level$\alpha$ if $\mathbb{E}_1[\phi(X)] \le \alpha$. A test $\phi$ is said to be most powerful at level-$\alpha$, if

• The test $\phi$ is level-$\alpha$.
• For all level-$\alpha$ tests $\phi'$, the test $\phi$ is more powerful than $\phi'$. That is,

$\mathbb{E}_1[\phi'(X)] \le \mathbb{E}_1[\phi(X)]$.

Thus we can see that finding a most powerful level-$\alpha$ test is a constrained optimisation problem. We wish to maximise the quantity

$\mathbb{E}_1[\phi(X)]$

subject to the constraint

$\mathbb{E}_0[\phi(X)] \le \alpha$

With this in mind, we turn to the method of undetermined multipliers.

## The method of undetermined multipliers

The method of undetermined multipliers (also called the method of Lagrange multipliers) is a very general optimisation tool. Suppose that we have a set $U$ and two function $f,g : U \to \mathbb{R}$ and we wish to maximise $f(u)$ subject to the constraint $g(u) \le 0$.

In the context of hypothesis testing, the set $U$ is the set of all tests $\phi$. The objective function $f$ is given by $f(\phi) = \mathbb{E}_1[\phi(X)]$. That is, $f(\phi)$ is the power of the test $\phi$. The constraint function $g$ is given by $g(\phi)=\mathbb{E}_1[\phi(X)]-\alpha$ so that $g(\phi) \le 0$ if and only if $\phi$ has level-$\alpha$.

The method of undetermined multipliers allows us to reduce this constrained optimisation problem to a (hopefully easier) unconstrained optimisation problem. More specifically we have the following result:

Proposition: Suppose that $u^* \in U$ is such that:

• $g(u^*) = 0$,
• There exists $k \ge 0$, such that $u^*$ maximises $f(u)-kg(u)$ over all $u \in U$.

Then $u$ maximises $f(u)$ under the constraint $g(u) \le 0$.

Proof: Suppose that $u^*$ satisfies the above two dot points. We need to show that for all $u \in U$, if $g(u) \le 0$, then $f(u) \le f(u^*)$. By assumption we know that $f(u^*)=0$ and $u^*$ maximises $f(u)-kg(u)$. Thus, for all $u \in U$,

$f(u^*)=f(u^*)-kg(u^*) \ge f(u)-kg(u)$.

Now suppose $g(u) \le 0$. Then, $kg(u) \le 0$ and so $f(u)-kg(u)\ge f(u)$ and so $f(u^*) \ge f(u)$ as needed. $\square$

The constant $k$ is the undetermined multiplier. The way to use the method of undetermined is to find values $u_k^*$ that maximise $h_k(u) = f(u)-kg(u)$ for each $k \ge 0$. The multiplier $k$ is then varied so that the constraint $g(u^*_k) = 0$ is satisfied.

## Proving the Neyman-Pearson lemma

Now let’s use the method of undetermined multipliers to find most powerful tests. Recall the set $U$ which we are optimising over is the set of all tests $\phi$. Recall also that we wish to optimise $f(\phi) = \mathbb{E}_1[\phi(X)]$ subject to the constraint $g(\phi) = \mathbb{E}_0[\phi(X)] - \alpha \le 0$. The method of undetermined multipliers says that we should consider maximising the function

$h_k(\phi) = \mathbb{E}_1[\phi(X)] - k\mathbb{E}_0[\phi(X)]$,

where $k \ge 0$. Suppose that both $\mathbb{P}_0$ and $\mathbb{P}_1$ have densities1 $p_0$ and $p_1$ with respect to some measure $\mu$. We can we can write the above expectations as integrals. That is,

$\mathbb{E}_0[\phi(X)] = \int \phi(x)p_0(x)\mu(dx)$ and $\mathbb{E}_1[\phi(X)] = \int \phi(x)p_1(x)\mu(dx)$.

Thus the function $h_k$ is equal to

$h_k(\phi) = \int \phi(x)p_1(x)\mu(dx)- k\int \phi(x)p_0(x)\mu(dx)=\int \phi(x)(p_1(x)-kp_0(x))\mu(dx)$.

We now wish to maximise the function $h_k(\phi)$. Recall that $\phi$ is a function that take values in $[0,1]$. Thus, the integral

$\int \phi(x)(p_1(x)-kp_0(x))\mu(dx)$,

is maximised if and only if $\phi(x)=1$ when $p_1(x)-kp_0(x) > 0$ and $\phi(x)=0$ when $p_1(x)-kp_0(x) < 0$. Note that $p_1(x)-kp_0(x) > 0$ if and only if $\frac{p_1(x)}{p_0(x)} > k$. Thus for each $k \ge 0$, a test $\phi_k$ maximises $h_k(\phi)$ if and only if

$\phi_k(x) = \begin{cases} 1 & \text{if } \frac{p_1(x)}{p_0(x)} > k,\\ 0 &\text{if } \frac{p_1(x)}{p_0(x)} < k. \end{cases}$

The method of undetermined multipliers says that if we can find $k$ so that the above is satisfied and $g(\phi_k) = 0$, then $\phi_k$ is a most powerful test. Recall that $g(\phi_k)=0$ is equivalent to $\mathbb{E}_1[\phi(X)]=\alpha$. By summarising the above argument, we arrive at the Neyman-Pearson lemma,

Neyman-Pearson Lemma2: Suppose that $\phi$ is a test such that

• $\mathbb{E}_0[\phi(X)] = \alpha$, and
• For some $k \ge 0$, $\phi(x) = \begin{cases} 1 & \text{if } \frac{p_1(x)}{p_0(x)} > k,\\ 0 & \text{if } \frac{p_1(x)}{p_0(x)}< k.\end{cases}$

then $\phi$ is most powerful at level-$\alpha$ for testing $H_0 : X \sim \mathbb{P}_0$ against $H_1 : X \sim \mathbb{P}_1$.

## The magic of Neyman-Pearson

By learning about undetermined multipliers I’ve been able to better understand the proof of the Neyman-Pearson lemma. I now view it is as clever solution to a constrained optimisation problem rather than something that comes out of nowhere.

There is, however, a different aspect of Neyman-Pearson that continues to surprise me. This aspect is the usefulness of the lemma. At first glance the Neyman-Pearson lemma seems to be a very specialised result because it is about simple hypothesis testing. In reality most interesting hypothesis tests have composite nulls or composite alternatives or both. It turns out that Neyman-Pearson is still incredibly useful for composite testing. Through ideas like monotone likelihood ratios, least favourable distributions and unbiasedness, the Neyman-Pearson lemma or similar ideas can be used to find optimal tests in a variety of settings.

Thus I must admit that the title of this blog post is a little inaccurate and deceptive. I do believe that, given the tools of undetermined multipliers and the set up of simple hypothesis testing, one is naturally led to the Neyman-Pearson lemma. However, I don’t believe that many could have realised how useful and interesting simple hypothesis testing would be.

## Footnotes

1. The assumption that $\mathbb{P}_0$ and $\mathbb{P}_1$ have densities with respect to a common measure $\mu$ is not a restrictive assumption since one can always take $\mu = \mathbb{P}_0+\mathbb{P}_1$ and the apply Radon-Nikodym. However there is often a more natural choice of $\mu$ such as Lebesgue measure on $\mathbb{R}^d$ or the counting measure on $\mathbb{N}$.
2. What I call the Neyman-Pearson lemma is really only a third of the Neyman-Pearson lemma. There are two other parts. One that guarantees the existence of a most powerful test and one that is a partial converse to the statement in this post.