Using Optmatch and RItools for Observational Studies

I am a contributor to the optmatch and the RItools packages for R. These two packages are separate, but complimentary. Both packages provide tools for adjusting observational data to exhibit “balance” on observed covariates. In a randomized control trial, treatment and control groups should have identical distributions over all covariates, observed and unobserved. Matching provides a method to create smaller groups in an observational study that are similarly balanced. Balance can be quantified so that alternative matches can be compared. When an acceptable match has been found, analysis can then proceed as if nature provided a blocked, randomized study.

Data

Both optmatch and RItools use a canonical dataset consisting of nuclear plants. From help(nuclearplants):

The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960’s and early 1970’s. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers’ subsidies may be hidden in the quoted capital costs.

With these data, we may wish to know if certain variables lead to higher or lower construction costs. One particular variable is pr, an indicator if a previous lightwater reactor at the same location was present. Such an installation might significantly increase or decrease costs. The rest of this document uses matching and balance testing to provide an answer to just that question.

I start by loading the packages and the data:

 > library(optmatch) > library(RItools) > data(nuclearplants) 

Before getting into the matching process, lets take a quick look at the balance of the data on all variables, except cost and pr, the LWR indicator. xBalance, among other tests, provides an omnibus balance test across any number of variables. This test compares the null hypothesis of “the data are balanced” against the alternative hypothesis of a lack of balance, where balance is what we would expect in a randomized trial with the same sample size. The test follows a chi-squared distribution, which xBalance will happily report:

 > xBalance(pr ~ . - (cost + pr), data = nuclearplants, report = c("chisquare.test")) ---Overall Test--- chisquare df p.value unstrat 12.4 9 0.192 --- Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1 

With a reported p-value of 0.19, the balance of this sample is not terrible (by conventional levels of hypothesis testing), but we might prefer something closer to 1. While there is no a priori p-value we should prefer, experience indicates that p-values in the neighborhood of .5 are achievable and mimic true randomized designs (though optimal balance levels are a subject of ongoing research).

Matching

A full discussion of matching procedures is beyond the scope of this document (see Rosenbaum (2010) for a more comprehensive discussion). In brief, matching attempts to group units with similar covariates, as if they had been blocked in a randomized experiment. The optimal match would be two units identical on every variable, observed and unobserved. In most datasets, no two units will be identical on all observed covariates. Instead, we can use a measure that summarizes all covariates and match based on the summary. The propensity score, the probability of receiving treatment given the observed covariates, has been a popular summary measure (for more on the theory, see Rosenbaum and Rubin (1983)). I’ll use a logistic regression to estimate the propensity scores of my observations, using a subset of the available variables:

 > model <- glm(pr ~ t1 + t2 + cap, family = binomial(), data = nuclearplants) 

With a propensity model, optmatch provides several functions for computing matched sets of observations. The fullmatch function takes a treatment by control matrix containing distances between observations and returns a factor indicating the set membership, if any, of all observations. Computing the distance matrix is simple using the mdist function. This function takes a linear model, a function, or a formula to produce distances based on propensity models, aribtrary user functions, or Mahalanobis distances between observations. We’ll use the propensity model. See the help page for mdist for the other alternatives.
m1 <- fullmatch(mdist(model))


We can compare the first match with a second, in which a caliper is placed on the date variable. This will constrain the matching algorithm, disallowing matches on observations with widely differing date values, even if the over all propensity scores are similar. Calipers can lead to poorer matches on observed variables but provide a method by which researchers can include subject matter information in the matching process. For example, if the cost of construction decreased over time due to increased efficiency in construction practices.

 > m2 <- fullmatch(mdist(model) + caliper(0.25, pr ~ date, data = nuclearplants)) 

Balance Testing

With two possible matches, do either produce adequate balance? As noted previously, the RItools package provides a method of quantifying balance in a matched set. The method (discussed in detail in Hansen and Bowers (2008)) compares treatment and control units within each block on a difference of means for each variable. Combining the these differences follows a chi-squared distribution. We can compare all the matches at the same time, along with the raw data (see the strata argument).

 > (allbalance <- xBalance(pr ~ . - (cost + pr), data = nuclearplants, report = c("chisquare.test", "std.diffs"), strata = data.frame(original = factor("none"), m1, m2))) strata original m1 m2 stat std.diff std.diff std.diff vars date -0.11468 -0.23368 0.06902 t1 0.10630 -0.01666 0.30232 t2 1.03269 * 0.27487 0.65635 cap 0.34012 -0.03631 0.24860 ne -0.16312 -0.47647 -0.13433 ct -0.30797 -0.65565 -0.78858 * bw 0.04511 0.29570 -0.20169 cum.n -0.09760 -0.00887 -0.16724 pt 0.41382 0.60274 0.00000 ---Overall Test--- chisquare df p.value original 12.39 9 0.192 m1 5.15 9 0.821 m2 10.07 9 0.345 --- Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1 

Both matches provide good balance. With a value of 0.821 we might be tempted to prefer the unconstrained match; however, with a p-value of 0.345, the match with a caliper also provides reasonable assurances of balance. As either provides plausible balance, researchers might choose to concentrate on substantively important covariates. When xBalance reports “std.diffs” (as above), we can plot the result to get a visual picture of balance on each covariate.

 > plot(allbalance) 

Analysis

Since we now have data that approximates a randomized experiment, we can use the same techniques to analyze this data as any blocked randomized experiment. For example, one-way ANOVA using pr as the treatment factor and m1 as the blocking factor.

 > anova(lm(nuclearplants$cost ~ nuclearplants$pr + m1)) Analysis of Variance Table Response: nuclearplants$cost Df Sum Sq Mean Sq F value Pr(>F) nuclearplants$pr 1 9037 9037 0.3394 0.5654 m1 5 222410 44482 1.6704 0.1785 Residuals 25 665726 26629 

Under conventional levels, we do not observe either the treatment or the blocking factor reach statistical significance. So we can conclude that existing lightwater reactors do not have an effect on construction costs that we can differentiate from chance.

Conclusion

In the analysis, I chose one of two plausible matches. It so happened that I selected the match with the larger p-value. Does this indicate that we should select the match with the highest p-value, as it most closely approximates a randomly allocated treatment? I would caution against that conclusion. Within the set of matches that are plausibly balanced, it is difficult to argue that one match is truly better than another. While in expectation, randomized treatments are perfectly balanced, in pratice, small deviations should be expected (with fewer deviations in larger experimental populations).

In short, don’t sweat the small stuff. Find a reasonable match and go with it. In fact, you may find that matches with lower p-values provide interesting substantive results. Here is an analysis of the second match, which included a caliper on the date of construction:

 > anova(lm(nuclearplants$cost ~ nuclearplants$pr + m2)) Analysis of Variance Table Response: nuclearplants$cost Df Sum Sq Mean Sq F value Pr(>F) nuclearplants$pr 1 4185 4185 0.2035 0.65654 m2 8 396199 49525 2.4080 0.05098 . Residuals 21 431905 20567 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

This matching indicates a significant blocking effect, which suggests that limiting matches by date may have something to do with the resulting costs. If we had blindly pursued higher p-value matches, we might not have observed this interesting result.