The beta-binomial distribution

The beta-binomial model is a Bayesian model used to analyze rates. For a great derivation and explanation of this model, I highly recommend watching the second lecture from Richard McElreath’s course Statistical Rethinking. In this model, the data, X, is assumed to be binomially distributed with a fixed number of trail N but an unknown rate \rho \in [0,1]. The rate \rho is given a \text{Beta}(a,b) prior. That is the prior distribution of \rho has a density

p(\rho) = \frac{1}{B(a,b)} \rho^{a-1}(1-\rho)^{b-1},

where B(a,b) =\int_0^1 \rho^{a-1}(1-\rho)^{b-1}d\rho is a normalizing constant. The model can thus be written as

\rho \sim \text{Beta}(a,b),
X | \rho \sim \text{Binom}(N,\rho).

This is a conjugate model, meaning that the posterior distribution of \rho is again a beta distribution. This can be seen by using Bayes rule

p(\rho | X) \propto p(X| \rho)p(\rho) \propto \rho^X(1-\rho)^{N-X}\rho^{a-1}(1-\rho)^{b-1}=\rho^{X+a-1}(1-\rho)^{(N-X)+b-1}.

The last expression is proportional to a beta density., specifically \rho | X \sim \text{Beta}(X+a, N-X+b).

The marginal distribution of X

In the above model we are given the distribution of \rho and the conditional distribution of X|\rho. To calculate the distribution of X, we thus need to marginalize over \rho. Specifically,

\displaystyle{p(X) = \int_0^1 p(X,\rho)d\rho = \int_0^1 p(X| \rho)p(\rho)d\rho.}

The term inside the above integral is

\displaystyle{p(X| \rho)p(\rho) = \binom{N}{X}\rho^X(1-\rho)^{N-X}\frac{1}{B(a,b)}\rho^{a-1}(1-\rho)^{b-1} = \frac{\binom{N}{X}}{B(a,b)}\rho^{X+a-1}(1-\rho)^{N-X+b-1} }.


\displaystyle{p(X) = \frac{\binom{N}{X}}{B(a,b)} \int_0^1 \rho^{X+a-1}(1-\rho)^{N-X+b-1}d\rho = \binom{N}{X}\frac{B(X+a, N-X+a)}{B(a,b)}}.

This distribution is called the beta-binomial distribution. Below is an image from Wikipedia showing a graph of p(X) for N=10 and a number of different values of a and b. You can see that, especially for small value of a and b the distribution is a lot more spread out than the binomial distribution. This is because there is randomness coming from both \rho and the binomial conditional distribution.

A plot of the beta-binomial distribution for different values of the parameters a and b. For small values of a and b, the distribution is very spread out.