On performing two hypothesis tests with the same null.

When performing multiple hypothesis tests, we should rightly be concerned that by repeated testing, we might reject null hypotheses more frequently than our \(\alpha\)-levels (or equivalently \(p\)-values) would guarantee. In particular, the family-wise error rate, the probability of rejecting at least one true null hypothesis, might be very different than the \(\alpha\) levels used in each individual test. In general, if we were to conduct \(k\) independent hypothesis tests and all the the null hypotheses were true, we would reject at least one null hypothesis with probability \(1 - (1 - \alpha)^k\). For \(k = 2\) this is \(2 \alpha - \alpha^2\). As we test more trials, we run the risk of thinking we would reject the truth of any individual trial say 5% of the time, whereas we would actually be rejecting the truth far more often. XKCD says it better than I ever could.

Let’s say you wish to conduct a hypothesis test based on some test statistic \(Z\) that is Normally distributed with mean zero and variance one, under the null hypothesis (in the following \(\Phi^{-1}\) is the quantitle function for \(Z\)). In general, you care about alternatives that imply that \(Z\) still has unit variance, but now is centered somewhere else. You could either test a two-side hypothesis, where you would reject the null if \(Z\) less than \(\Phi^{-1}(\alpha/2)\) or \(Z\) is greater than \(\Phi^{-1}(1 - \alpha/2)\), or a one sided hypothesis, where you would reject the null if \(Z\) exceeds \(\Phi^{-1}(1 - \alpha)\). Both of these tests would have proper size, rejecting the truth only \(\alpha\) percentage of the time. Individually, either of these tests would meet the usual basic requirement we place on tests.

But what about multiple testing? Should we be concerned? What is the probability of rejecting one or both tests when \(Z\) is distributed as specified under the null hypothesis (i.e., standard Normal). For the one sided test, we would reject when \(Z > \Phi^{-1}(\alpha)\). For the two sided, we reject when \(Z > \Phi^{-1}(1- \alpha/2)\) or \(Z < \Phi^{-1}(\alpha/2)\). Putting these together:

\begin{align} & P([Z > \Phi^{-1}(1 - \alpha)] \cup [Z > \Phi^{-1}(1 - \alpha/2)] \cup [Z < \Phi^{-1}(\alpha/2)])\\ &= 1 - P(\Phi^{-1}(\alpha/2) < Z < \Phi^{-1}(1 - \alpha)) \\ &= 1 - (1 - \alpha - \alpha/2) = \frac{3\alpha}{2} \end{align}

So would reject at least one of the test more often than \(\alpha\), but the result is not as bad as the independent tests. For example, with two independent tests, and \(\alpha = 0.05\), we would reject at least one of the true two nulls 9.75% of the time, but only reject at least one of the two tests of the same null 7.5% of the time. This difference comes from the strong dependence introduced by testing the same null with the same test statistic. Since rejecting the two sided test implies rejecting the one sided, we have fewer opportunities to mislead ourselves with respect to error rates than when considering independent tests. For independent tests, we can do little better than applying Bonferroni corrections across the multiple tests, but with known dependence, we can greatly improve testing procedures to maintain family wise error rates, for example using closed testing procedure.

Finally, it is worth noting that if the researcher first does a two tailed test and then decides to throw away the results unconditionally in order to perform a one-tailed test, all of the usual guarantees about the testing procedure hold. The probability of rejecting the one tailed test would be \(P(Z > \Phi^{-1}(1 - \alpha)) = \alpha\), which is exactly what we would want.

Comments »

New working papers added

I’ve added two new working papers on the use of covariance adjustment in causal and randomization inference, particulary using machine learning techniques (with Jake Bowers, Ben B. Hansen, and Costa Panagopoulos) and on improving test statistic selection for inference in spillover models (with Jake Bowers and Peter M. Aronow). Abstracts and PDFs can be found on my Academics page.

Comments »

W. S. Gosset Society meeting, September 2015

Update: The meeting is Friday, 9/18 at 4pm at the statistics annex (909 Nevada). Sign up for emails about future meetings.

For the September meeting of the W. S. Gosset Society, we’ll pay honor to our namesake and investigate Gosset’s 1908 paper and its continuing influence:

I’d like to hold the meeting during the week of 9/13 – 9/19. If you are interested in joining the group, please send me your availability that week (

The W. S. Gosset Society is an informal reading group for UIUC statistics students. The group focuses on connecting modern research to seminal works in the field of statistics. Whenever possible, we werve as a venue for students to present their own research and get feedback.

Comments »


SLAMM 2012

Notes from the 5th St. Louis Area Meeting Meeting

"Returning to the Cradle of Democracy: a working paper"

Citizen responses under election and sortition

Polmeth 2011

Brief thoughts on Polmeth 2011 and a notice of a working paper/poster.

test_that -- A brief review

A few thoughts after a month of using the "test_that" package.

Optmatch and RItools -- New homes and techniques

Announcing the public development of optmatch and RItools as well as the publication of the complete blocking, balancing, and analysis article.

Atlantic Causal Inference Conference

Highlights from ACIC 2011 at UMich

More and less theory, please

Theory can play a useful role in designing experiments, but keep it out of the analysis.

Designing and Analyzing Studies with Optmatch and RItools (Part 1)

The first in a series of posts on using Optmatch and RItools for experimental studies.

St. Louis Area Methods Meeting 2011 (Friday)

A brief summary of SLAMM 2011.

Getting involved in field research in development

How do young researchers find partners for field experiments in the developing world?

Polborn and Krasa at the Comparative Politics Seminar

A formal model of party platforms with an intuitive result.

Collaboration for Social Scientists in TPM

My article "Collaboration for Social Scientists" published in The Political Methodologist

Melissa Schwartzberg on Democracy, Judgment, and Juries

Are juries democratic? Schwartzberg tackles this question.

Jose Cheibub on Civil Wars in Africa

Do elections destabilize African nations?

Wendy Tam Cho on Voter Migration and Partisanship

Tracing voters moves within and across states.

Kathy Cramer Walsh on Rural Perspectives of Political Inequality

Participant observation in Wisconsin.

Christopher Dawes on Psychological Traits as Intermediaries for Genes

Can twin studies tell us about mediating factors between genetics and behavior?

Peeking inside R functions

Exploring R function environments

Jon Hurwitz UIUC Seminar

An interesting pair of puzzles concerning opinions on capital punishment.

A quick review of Destination Dissertation

A travel guide to creating a dissertation.

Kevin Clarke visits UIUC

The Political Science department invited Kevin Clarke of Rochester for a talk on his upcoming book, "A Model Discipline."

Working papers published

See the "Academics" section for some new working papers

Speeding up Optmatch while improving match quality

Stratification of large matching problems has many benefits.

Combinadics in R

How to generate combinations one at a time.

Using xBalance with MatchIt

Assessing balance adjustment using xBalance and the MatchIt package

Using Optmatch and RItools for Observational Studies

A soup-to-nuts propensity score matching analysis.


Why not use more than the ASCII character set when programming?

Scoping Bugs

In which the author considers similar bugs in R and JavaScript.

Drinking the Homebrew

Dropping MacPorts, picking up a tall one