A question on stats.stackexchange.com reminded me of some code I wrote earlier this summer. The code provides a correspondence between the natural numbers 1 to (N choose K) and all the unique K sized combinations one could draw from N items. This relationship is know as the combinadic of an integer (and my code is pased on the reference implementation). Generating combinations is useful for permutation tests, in which one applies a test statistic on all possible allocations of treatment to an experimental pool.
Since the number of possible combinations grows extremely rapidly, realizing all possible combinations at once can be extremely memory intentsive. Using combinadics, one can trade increased execution time for lower memory usage. Since they are indexed by integers, keeping track of which combination is currently used is trivial. But since the number of combinations grows very quickly, we still need to handle extremely large integers, perhaps larger than the default integer type in R accepts. Luckily, the GMP package provides for “big ints,” with which we can write a “N Choose K” algorithm for arbitrarily large numbers:
library(gmp)
bigchoose <- function(n, k) {
if (n < 1 || k < 1 || k > n) {
return(as.bigz(0))
}
if (n == k) {
return(as.bigz(1))
}
if (k > (n/2)) {
k <- n - k
}
numer <- as.bigz(1)
for (i in n:(n - k + 1)) {
numer <- numer * i
}
denom <- as.bigz(1)
for (i in 1:k) {
denom <- denom * i
}
return(numer/denom)
}
Here are two functions to turn an integer into a vector representing a combination (a process I call decoding) and to turn a combination into an integer (encoding). As I expect that these operations will be frequent for a given N and K, these functions produce functions that take integers and combinations, respectively.
combinadic.decoder.factory <- function(n, k) {
# n, k are fixed at the start
# i is the combinadic index (0 < i < n)
# precompute a few sequences we'll need
ks <- k:1 # the bottom of the choose tests
max.combinadic <- bigchoose(n, k)
function(i) {
i <- as.bigz(i)
if (i < 1 || `>.bigz`(i, max.combinadic)) {
stop(paste("Combinatic out of range for", n, "choose", k ))
}
# part of the frequent translation to R's 1... sequences
i <- i - 1
# initialize a vector to hold the values
remaining <- i
previous.candidate <- n
combination <- numeric(k)
for(j in ks) {
value <- remaining + 1
while (`>.bigz`(value, remaining)) {
current.candidate <- previous.candidate - 1
value <- bigchoose(current.candidate, j)
previous.candidate <- current.candidate
}
remaining <- sub.bigz(remaining, value)
combination[j] <- current.candidate
}
return(combination + 1) # translate to 1... counting
}
}
combinadic.encoder.factory <- function(n, k) {
ks <- 1:k
function(encoded) {
stopifnot(length(encoded) == k)
encoded <- encoded - 1 # translate from 1... counting
expanded <- as.bigz(0)
for (i in ks) {
expanded <- `+.bigz`(expanded, bigchoose(encoded[i], i))
}
return(expanded + 1)
}
}
Finally, as an illustration, the classic Lady Tasting Tea problem. There are 8 cups, 4 of which have the milk added first (for concreteness, say these are the first 4 cups). What is the distribution of correctly labeling cups as having milk added first? To answer the question, we need a test statistic to indicate how many cups were correctly labeled.
> test.statistic <- function(cups) {
+ sum(cups %in% c(1, 2, 3, 4))
+ }
Apply this test statistic to each possible allocation of cups, which corresponds to a combination of size 4 taken from 8 possible units.
> maxn <- bigchoose(8, 4)
> decoder <- combinadic.decoder.factory(8, 4)
> counts <- numeric(5)
> names(counts) <- 0:4
> i <- 1
> while (i < (maxn + 1)) {
+ tmp <- test.statistic(decoder(i))
+ counts[tmp + 1] = counts[tmp + 1] + 1
+ i <- i + 1
+ }
> barplot(counts/as.numeric(maxn))
(NB: I’ve encountered strange results when creating lists/vectors of “big integers,” the result of encoding a combination. Use for and while loops instead.)

The plot shows that while a guess by chance of one, two, or three cups are not unlikely, a random guess that results in zero or four correct cups would be extremely rare. While astute readers will notice that this result can be found analytically, it still serves as a simple demonstration of a permutation test. The test statistic employed in this example leads to an analytical result. Other test statistics may not be as simple to solve. Permutation tests using all possible combinations always generates an exact distribution for any test statistic.
This code was originally written for inclusion in RItools. Subsequently, I decided that only rare cases does one need to generate the entire null distribution of the test statistic, and in most cases sampling from possible combinations is sufficient. Future posts will address approximate approaches to permutation tests.
In a previous post, I demonstrated how to create a propensity score matching, test balance, and analyze the outcome variable using the optmatch and RItools packages. The same strategy can be used with other matching algorithms, for example the various methods included in the MatchIt package.
I’ll use the same basic question and data from my previous article. The MatchIt package wraps optmatch to provide its “full” and “optimal” matching methods, so I will the “full” option to maintain consistency with my previous article. The first step is loading the packages and the data:
> library(MatchIt)
> library(optmatch)
> library(RItools)
> data(nuclearplants)
The interface for MatchIt is similar to optmatch for propensity score matches, except that the matchit() function compresses the process into a single step of specifying the propensity formula and producing the match, while fullmatch() allows a user to specify any number of distance matrices. In the end, the interface is fairly similar. As with the previous article, I match on a subset of the covariates.
> example.formula <- formula(pr ~ t1 + t2 + cap)
> match.opt <- fullmatch(
mdist(glm(example.formula,
data = nuclearplants,
family = binomial())))
> all.mit <- matchit(example.formula,
data = nuclearplants,
method = "full")
The all.mit object contains (among other items) a vector indicating each object’s matched set. For compatibility, save it as a factor:
> match.mit <- as.factor(all.mit$subclass)
Unsurprisingly, as MatchIt uses optmatch the two matches are identical.
> lapply(split(nuclearplants, match.opt), rownames)
$m.1
[1] "N" "Z" "a"
$m.10
[1] "I" "G"
$m.2
[1] "A" "B" "D" "V" "F" "b"
$m.5
[1] "U" "c"
$m.6
[1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"
$m.8
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"
> lapply(split(nuclearplants, match.mit), rownames)
$`1`
[1] "N" "Z" "a"
$`2`
[1] "I" "G"
$`3`
[1] "A" "B" "D" "V" "F" "b"
$`4`
[1] "U" "c"
$`5`
[1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"
$`6`
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"
Now that I have a factor listing the groups, I can run xBalance to assess the balance properties of the match:
> xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
strata = match.mit,
report = "chisquare.test")
---Overall Test---
chisquare df p.value
strat 5.1 9 0.82
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1
With a reported p-value of 0.82, there is little evidence against the null of balance, so we would fail to reject it.
This walk through used the the “full” method for matchit(), but the same techniques will work with other matchit() methods, such as coarsened exact matching or nearest neighbor. If you are reasonably confident that you wish to use optimal matching, you should consider using the optmatch package directly, instead of using it through MatchIt. In future posts I will be demonstrating important techniques to speed up the matching process (which can be a great benefit to large datasets) and how you can create matches that incorporate more subject matter information than can be included in a simple logit model.
I am a contributor to the optmatch and the RItools packages for R. These two packages are separate, but complimentary. Both packages provide tools for adjusting observational data to exhibit “balance” on observed covariates. In a randomized control trial, treatment and control groups should have identical distributions over all covariates, observed and unobserved. Matching provides a method to create smaller groups in an observational study that are similarly balanced. Balance can be quantified so that alternative matches can be compared. When an acceptable match has been found, analysis can then proceed as if nature provided a blocked, randomized study.
Both optmatch and RItools use a canonical dataset consisting of nuclear plants. From help(nuclearplants):
The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960’s and early 1970’s. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers’ subsidies may be hidden in the quoted capital costs.
With these data, we may wish to know if certain variables lead to higher or lower construction costs. One particular variable is pr, an indicator if a previous lightwater reactor at the same location was present. Such an installation might significantly increase or decrease costs. The rest of this document uses matching and balance testing to provide an answer to just that question.
I start by loading the packages and the data:
> library(optmatch)
> library(RItools)
> data(nuclearplants)
Before getting into the matching process, lets take a quick look at the balance of the data on all variables, except cost and pr, the LWR indicator. xBalance, among other tests, provides an omnibus balance test across any number of variables. This test compares the null hypothesis of “the data are balanced” against the alternative hypothesis of a lack of balance, where balance is what we would expect in a randomized trial with the same sample size. The test follows a chi-squared distribution, which xBalance will happily report:
> xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
report = c("chisquare.test"))
---Overall Test---
chisquare df p.value
unstrat 12.4 9 0.192
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1
With a reported p-value of 0.19, the balance of this sample is not terrible (by conventional levels of hypothesis testing), but we might prefer something closer to 1. While there is no a priori p-value we should prefer, experience indicates that p-values in the neighborhood of .5 are achievable and mimic true randomized designs (though optimal balance levels are a subject of ongoing research).
A full discussion of matching procedures is beyond the scope of this document (see Rosenbaum (2010) for a more comprehensive discussion). In brief, matching attempts to group units with similar covariates, as if they had been blocked in a randomized experiment. The optimal match would be two units identical on every variable, observed and unobserved. In most datasets, no two units will be identical on all observed covariates. Instead, we can use a measure that summarizes all covariates and match based on the summary. The propensity score, the probability of receiving treatment given the observed covariates, has been a popular summary measure (for more on the theory, see Rosenbaum and Rubin (1983)). I’ll use a logistic regression to estimate the propensity scores of my observations, using a subset of the available variables:
> model <- glm(pr ~ t1 + t2 + cap,
family = binomial(), data = nuclearplants)
With a propensity model, optmatch provides several functions for computing matched sets of observations. The fullmatch function takes a treatment by control matrix containing distances between observations and returns a factor indicating the set membership, if any, of all observations. Computing the distance matrix is simple using the mdist function. This function takes a linear model, a function, or a formula to produce distances based on propensity models, aribtrary user functions, or Mahalanobis distances between observations. We’ll use the propensity model. See the help page for mdist for the other alternatives.
m1 <- fullmatch(mdist(model))
We can compare the first match with a second, in which a caliper is placed on the date variable. This will constrain the matching algorithm, disallowing matches on observations with widely differing date values, even if the over all propensity scores are similar. Calipers can lead to poorer matches on observed variables but provide a method by which researchers can include subject matter information in the matching process. For example, if the cost of construction decreased over time due to increased efficiency in construction practices.
> m2 <- fullmatch(mdist(model) +
caliper(0.25, pr ~ date, data = nuclearplants))
With two possible matches, do either produce adequate balance? As noted previously, the RItools package provides a method of quantifying balance in a matched set. The method (discussed in detail in Hansen and Bowers (2008)) compares treatment and control units within each block on a difference of means for each variable. Combining the these differences follows a chi-squared distribution. We can compare all the matches at the same time, along with the raw data (see the strata argument).
> (allbalance <- xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
report = c("chisquare.test", "std.diffs"),
strata = data.frame(original = factor("none"), m1, m2)))
strata original m1 m2
stat std.diff std.diff std.diff
vars
date -0.11468 -0.23368 0.06902
t1 0.10630 -0.01666 0.30232
t2 1.03269 * 0.27487 0.65635
cap 0.34012 -0.03631 0.24860
ne -0.16312 -0.47647 -0.13433
ct -0.30797 -0.65565 -0.78858 *
bw 0.04511 0.29570 -0.20169
cum.n -0.09760 -0.00887 -0.16724
pt 0.41382 0.60274 0.00000
---Overall Test---
chisquare df p.value
original 12.39 9 0.192
m1 5.15 9 0.821
m2 10.07 9 0.345
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1
Both matches provide good balance. With a value of 0.821 we might be tempted to prefer the unconstrained match; however, with a p-value of 0.345, the match with a caliper also provides reasonable assurances of balance. As either provides plausible balance, researchers might choose to concentrate on substantively important covariates. When xBalance reports “std.diffs” (as above), we can plot the result to get a visual picture of balance on each covariate.
> plot(allbalance)

Since we now have data that approximates a randomized experiment, we can use the same techniques to analyze this data as any blocked randomized experiment. For example, one-way ANOVA using pr as the treatment factor and m1 as the blocking factor.
> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m1))
Analysis of Variance Table
Response: nuclearplants$cost
Df Sum Sq Mean Sq F value Pr(>F)
nuclearplants$pr 1 9037 9037 0.3394 0.5654
m1 5 222410 44482 1.6704 0.1785
Residuals 25 665726 26629
Under conventional levels, we do not observe either the treatment or the blocking factor reach statistical significance. So we can conclude that existing lightwater reactors do not have an effect on construction costs that we can differentiate from chance.
In the analysis, I chose one of two plausible matches. It so happened that I selected the match with the larger p-value. Does this indicate that we should select the match with the highest p-value, as it most closely approximates a randomly allocated treatment? I would caution against that conclusion. Within the set of matches that are plausibly balanced, it is difficult to argue that one match is truly better than another. While in expectation, randomized treatments are perfectly balanced, in pratice, small deviations should be expected (with fewer deviations in larger experimental populations).
In short, don’t sweat the small stuff. Find a reasonable match and go with it. In fact, you may find that matches with lower p-values provide interesting substantive results. Here is an analysis of the second match, which included a caliper on the date of construction:
> anova(lm(nuclearplants$cost ~ nuclearplants$pr + m2))
Analysis of Variance Table
Response: nuclearplants$cost
Df Sum Sq Mean Sq F value Pr(>F)
nuclearplants$pr 1 4185 4185 0.2035 0.65654
m2 8 396199 49525 2.4080 0.05098 .
Residuals 21 431905 20567
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This matching indicates a significant blocking effect, which suggests that limiting matches by date may have something to do with the resulting costs. If we had blindly pursued higher p-value matches, we might not have observed this interesting result.