Maximum likelihood and the method of moments are two ways to estimate parameters from data. In general, the two methods can differ but for one-dimensional exponential families they produce the same estimates.
Suppose that is a one-dimensional exponential family written in canonical form. That is,
and there exists a reference measure
such each distribution
has a density
with respect to
and,
The random variable is a sufficient statistic for the model
. The function
is the log-partition function for the family
. The condition,
implies that
It turns out that the function is differentiable and that differentiation and integration are exchangeable. This implies that
Note that . Thus,
This means that , the expectation of
under
.
Now suppose that we have an i.i.d. sample and we want to use this sample to estimate
. One way to estimate
is by maximum likelihood. That is, we choose the value of
that maximises the likelihood,
When using the maximum likelihood estimator, it is often easier to work with the log-likelihood. The log-likelihood is,
.
Maximising the likelihood is equivalent to maximising the log-likelihood. For exponential families, the log-likelihood is a concave function of . Thus the maximisers can be found be differentiation and solving the first order equations. Note that,
Thus the maximum likelihood estimate (MLE) solves the equation,
But recall that . Thus the MLE is the solution to the equation,
.
Thus the MLE is the value of for which the expectation of
matches the empirical average from our sample. That is, the maximum likelihood estimator for an exponential family is a method of moments estimator. Specifically, the maximum likelihood estimator matches the moments of the sufficient statistic
.
A counter example
It is a special property of maximum likelihood estimators that the MLE is a method of moments estimator for the sufficient statistic. When we leave the nice world of exponential families, the estimators may differ.
Suppose that we have data where
is the uniform distribution on
. A minimal sufficient statistic for this model is
– the maximum of
. Given what we saw before, we might imague that the MLE for this model would be a method of moments estimator for
but this isn’t the case.
The likelihood for is,
Thus the MLE is . However, under
,
has a
distribution. Thus,
so the method of moments estimator would be
.