A Monte Carlo significance test of the null hypothesis requires creating independent samples
. The idea is if
and independently
are i.i.d. from
, then for any test statistic
, the rank of
among
is uniformly distributed on
. This means that if
is one of the
largest values of
, then we can reject the hypothesis
at the significance level
.
The advantage of Monte Carlo significance tests is that we do not need an analytic expression for the distribution of under
. By generating the i.i.d. samples
, we are making an empirical distribution that approximates the theoretical distribution. However, sometimes sampling
is just as intractable as theoretically studying the distribution of
. Often approximate samples based on Markov chain Monte Carlo (MCMC) are used instead. However, these samples are not independent and may not be sampling from the true distribution. This means that a test using MCMC may not be statistically valid
In the 1989 paper Generalized Monte Carlo significance tests, Besag and Clifford propose two methods that solve this exact problem. Their two methods can be used in the same settings where MCMC is used but they are statistically valid and correctly control the Type 1 error. In this post, I will describe just one of the methods – the serial test.
Background on Markov chains
To describe the serial test we will need to introduce some notation. Let denote a transition matrix for a Markov chain on a discrete state space
A Markov chain with transition matrix
thus satisfies,
Suppose that the transition matrix has a stationary distribution
. This means that if
is a Markov chain with transition matrix
and
is distributed according to
, then
is also distributed according to
. This implies that all
are distributed according to
.
We can construct a new transition matrix from
and
by
. The transition matrix
is called the reversal of
. This is because for all
and
in
,
. That is the chance of drawing
from
and then transitioning to
according to
is equal to the chance of drawing
from
and then transitioning to
according to

The new transition matrix also allows us to reverse longer runs of the Markov chain. Fix
and let
be a Markov chain with transition matrix
and initial distribution
. Also let
be a Markov chain with transition matrix
and initial distribution
, then
,
where means equal in distribution.

The serial test
Suppose we want to test the hypothesis where
is our observed data and
is some distribution on
. To conduct the serial test, we need to construct a Markov chain
for which
is a stationary distribution. We then also need to construct the reversal
described above. There are many possible ways to construct
such as the Metropolis-Hastings algorithm.
We also need a test statistic . This is a function
which we will use to detect outliers. This function is the same function we would use in a regular Monte Carlo test. Namely, we want to reject the null hypothesis when
is much larger than we would expect under
.
The serial test then proceeds as follows. First we pick uniformly in
and set
. We then generate
as a Markov chain with transition matrix
that starts at
. Likewise we generate
as a Markov chain that starts from
.

We then apply to each of
and count the number of
such that
. If there are
such
, then the reported p-value of our test is
.
We will now show that this test produces a valid p-value. That is, when , the probability that
is less than
is at most
. In symbols,
Under the null hypothesis ,
is equal in distribution to generating
from
and using the transition matrix
to go from
to
.

Thus, under the null hypothesis, the distribution of does not depend on
. The entire procedure is equivalent to generating a Markov chain
with initial distribution
and transition matrix
, and then choosing
independently of
. This is enough to show that the serial method produces valid p-values. The idea is that since the distribution of
does not depend on
and
is uniformly distributed on
, the probability that
is in the top
proportion of
should be at most
. This is proved more formally below.
For each , let
be the event that
is in the top
proportion of
. That is,
.
Let be the indicator function for the event
. Since at must
values of
can be in the top
fraction of
, we have that
,
Therefor, by linearity of expectations,
By the law of total probability we have,
,
Since is uniform on
,
is
for all
. Furthermore, by independence of
and
, we have
.
Thus, by our previous bound,
.
Applications
In the original paper by Besag and Clifford, the authors discuss how this procedure can be used to perform goodness-of-fit tests. They construct Markov chains that can test the Rasch model or the Ising model. More broadly the method can be used to tests goodness-of-fit tests for any exponential family by using the Markov chains developed by Diaconis and Sturmfels.
A similar method has also been applied more recently to detect Gerrymandering. In this setting, the null hypothesis is the uniform distribution on all valid redistrictings of a state and the test statistics measure the political advantage of a given districting.
References
- Besag, Julian, and Peter Clifford. “Generalized Monte Carlo Significance Tests.” Biometrika 76, no. 4 (1989)
- Persi Diaconis, Bernd Sturmfels “Algebraic algorithms for sampling from conditional distributions,” The Annals of Statistics, Ann. Statist. 26(1), 363-397, (1998)
- Chikina, Maria et al. “Assessing significance in a Markov chain without mixing.” Proceedings of the National Academy of Sciences of the United States of America vol. 114,11 (2017)