I am very excited to be writing a blog post again – it has been nearly a year! This post marks a new era for the blog. In September I started a statistics PhD at Stanford University. I am really enjoying my classes and I am learning a lot. I might have to change the name of the blog soon but for now let’s stick with “Maths to Share” although you will undoubtedly see more and more statistics here.
Today I would like to talk about leverages scores. Leverages scores are a way to quantify how sensitive a model is and they can be used to explain the different behaviour in these two animations


Linear Models
I recently learnt about leverage scores in the applied statistics course STATS 305A. This course is all about the linear model. In the linear model we assume with have data points
where
is a vector in
and
is a number in
. We model
as a linear function of
plus noise. That is we assume
, where
is a unknown vector of coefficients and
is a random variable with mean
and variance
. We also require that for
, the random variable
is uncorrelated with
.
We can also write this as a matrix equation. Define to be the vector with entries
and define
to be the matrix with rows
, that is
and
Then our model can be rewritten as
where is a random vector with mean
and covariance matrix
. To simplify calculations we will assume that
contains an intercept term. This means that the first column of
consists of all 1’s.
In the two animations at the start of this post we have two nearly identical data sets. The data sets are an example of simple regression when each vector is of the form
where
is a number. The values
are on the horizontal axis and the values
are on the vertical axis.
Estimating the coefficients
In the linear model we wish to estimate the parameter which contains the coefficients of our model. That is, given a sample
, we wish to construct a vector
which approximates the true parameter
. In ordinary least square regression we choose
to be the vector
that minimizes the quantity
.
Differentiating with respect to and setting the derivative equal to
shows that
is a solution to the normal equations:
We will assume that the matrix is invertible. In this case then the normal equations have a unique solution
.
Now that we have our estimate , we can do prediction. If we are given a new value
we would use
to predict the corresponding value of
. This was how the straight lines in the two animations were calculated.
We can also calculate the model’s predicted values for the data that we used to fit the model. These are denoted by
. Note that
where is called the hat matrix for the model (since it puts the hat
on
.
Leverage scores
We are now ready to talk about leverage scores and the two animations. For reference, here they are again:


In both animations the stationary line corresponds to an estimator that was calculated using only the black data points. The red points are new data points with different
values and varying
values. The moving line corresponds to an estimator
calculated using the red data point as well as all the black points. We can see immediately that if the red point is far away from the “bulk” of the other
points, then the moving line is a lot more sensitive to the
value of the red point.
The leverage score of a data point is defined to be
That is, the leverage score tells us how much does the prediction
change if we change
.
Since , the leverage score of
is
, the
diagonal element of the hat matrix
. The idea is that if a data point
has a large leverage score, then the model is more sensitive to changes in that value of
.
This can be seen in a leave one out calculation. This calculation tells us what we should expect if we make a leave-one-out model – a model that uses all the data points apart from one. In our animations, this corresponds to the stationary line.
The leave one out calculation says that the predicted value using all the data is always between the true value and the predicted value from the leave-one-out model. In our animations this can be seen by noting that the moving line (the full model) is always between the red point (the true value) and the stationary line (the leave-one-out model).
Furthermore the leverage score tells us exactly how close the predicted value is to the true value. We can see that the moving line is much closer to the red dot in the high leverage example on the right than the low leverage example on the left.
Mahalanobis distance
We now know that the two animations are showing the sensitivity of a model to two different data points. We know that a model is more sensitive to point with high leverage than to points with low leverage. We still haven’t spoken about why some point have higher leverage and why the point on the right has higher leverage.
It turns out that leverage score are measuring how far away a data point is from the “bulk” of the other ‘s. More specifically in a one dimensional example like what we have in the animations
where is the number of data points,
is the sample mean and
is the sample variance. Thus high leverage scores correspond to points that are far away from the centre of our data
. In higher dimensions a similar result holds if we measure distance using Mahalanobis distance.
The mean of the black data points is approximately 2 and so we can now see why the second point has higher leverage. The two animations were made in Geogebra. You can play around with them here and here.