I am very excited to be writing a blog post again – it has been nearly a year! This post marks a new era for the blog. In September I started a statistics PhD at Stanford University. I am really enjoying my classes and I am learning a lot. I might have to change the name of the blog soon but for now let’s stick with “Maths to Share” although you will undoubtedly see more and more statistics here.
Today I would like to talk about leverages scores. Leverages scores are a way to quantify how sensitive a model is and they can be used to explain the different behaviour in these two animations
I recently learnt about leverage scores in the applied statistics course STATS 305A. This course is all about the linear model. In the linear model we assume with have data points where is a vector in and is a number in . We model as a linear function of plus noise. That is we assume , where is a unknown vector of coefficients and is a random variable with mean and variance . We also require that for , the random variable is uncorrelated with .
We can also write this as a matrix equation. Define to be the vector with entries and define to be the matrix with rows , that is
Then our model can be rewritten as
where is a random vector with mean and covariance matrix . To simplify calculations we will assume that contains an intercept term. This means that the first column of consists of all 1’s.
In the two animations at the start of this post we have two nearly identical data sets. The data sets are an example of simple regression when each vector is of the form where is a number. The values are on the horizontal axis and the values are on the vertical axis.
Estimating the coefficients
In the linear model we wish to estimate the parameter which contains the coefficients of our model. That is, given a sample , we wish to construct a vector which approximates the true parameter . In ordinary least square regression we choose to be the vector that minimizes the quantity
Differentiating with respect to and setting the derivative equal to shows that is a solution to the normal equations:
We will assume that the matrix is invertible. In this case then the normal equations have a unique solution .
Now that we have our estimate , we can do prediction. If we are given a new value we would use to predict the corresponding value of . This was how the straight lines in the two animations were calculated.
We can also calculate the model’s predicted values for the data that we used to fit the model. These are denoted by . Note that
where is called the hat matrix for the model (since it puts the hat on .
We are now ready to talk about leverage scores and the two animations. For reference, here they are again:
In both animations the stationary line corresponds to an estimator that was calculated using only the black data points. The red points are new data points with different values and varying values. The moving line corresponds to an estimator calculated using the red data point as well as all the black points. We can see immediately that if the red point is far away from the “bulk” of the other points, then the moving line is a lot more sensitive to the value of the red point.
The leverage score of a data point is defined to be That is, the leverage score tells us how much does the prediction change if we change .
Since , the leverage score of is , the diagonal element of the hat matrix . The idea is that if a data point has a large leverage score, then the model is more sensitive to changes in that value of .
This can be seen in a leave one out calculation. This calculation tells us what we should expect if we make a leave-one-out model – a model that uses all the data points apart from one. In our animations, this corresponds to the stationary line.
The leave one out calculation says that the predicted value using all the data is always between the true value and the predicted value from the leave-one-out model. In our animations this can be seen by noting that the moving line (the full model) is always between the red point (the true value) and the stationary line (the leave-one-out model).
Furthermore the leverage score tells us exactly how close the predicted value is to the true value. We can see that the moving line is much closer to the red dot in the high leverage example on the right than the low leverage example on the left.
We now know that the two animations are showing the sensitivity of a model to two different data points. We know that a model is more sensitive to point with high leverage than to points with low leverage. We still haven’t spoken about why some point have higher leverage and why the point on the right has higher leverage.
It turns out that leverage score are measuring how far away a data point is from the “bulk” of the other ‘s. More specifically in a one dimensional example like what we have in the animations
where is the number of data points, is the sample mean and is the sample variance. Thus high leverage scores correspond to points that are far away from the centre of our data . In higher dimensions a similar result holds if we measure distance using Mahalanobis distance.
The mean of the black data points is approximately 2 and so we can now see why the second point has higher leverage. The two animations were made in Geogebra. You can play around with them here and here.