Scoping Bugs

I ran a across a strange bug in R recently. Like all the best programming languages, R treats functions as first class objects. That is to say that functions can be passed as arguments and return values from functions, named as variables, and, while not part of the strict definition of first class functions, maintain copies of the creating environment. This last point is known as lexical (or static) scope.

On lexical scoping

Lexical scoping was a major innovation to making programs simpler to understand. With lexical scoping, variable names are defined “locally” — that is, if a function is working a variable foo, that variable cannot be written over by a caller having its own variable foo. Here is an example that illustrates the property, using R. What does the last line return?

my.variable <- 42
f <- function() {
  return(my.variable)
}

g <- function(my.variable) {
  return(f())  
}

g(100)

If you answered 42, you’d be correct. The two uses of the name my.variable would cause a dynamically scoped program to return 100. Under lexical scoping, these are distinct variables, defined by the different scopes of the f and g functions. Under dynamic scoping, applying g to 100 would lead f to look up my.variable and find the value of 100. While this example is contrived, in programs of any size, lexical scoping (at least as the default) prevents different outer scopes from changing the behavior of inner functions. (See Clojure’s binding form for an example of useful dynamic scope on demand.)

So lets take this a step further, and create some functions that save their lexical environment. To show the correct behavior, here is a small Scheme program that creates 5 functions, each of which returns its index when called (lambda means “create a new function”):

(define fns (map (lambda (x) (lambda () x)) '(1 2 3 4 5)))
(map apply fns)
; returns (1 2 3 4 5)

Now here is the same thing in R:

fns <- lapply(1:5, function(i) { function() { i } })
lapply(fns, function(f) { f() })
# returns a list of (5, 5, 5, 5, 5)

Whoa! What is going on?

R’s scoping bug

Clearly, something is amiss with R’s scoping rules. To be honest, I’m not entirely sure what (though I will unveil a work around). I had originally written this code in an imperative for loop, and my immediate thought was that R was bitten by a classic JavaScript bug. JavaScript has a strange quirk where by loop indices are not considered local to the scope, and are rewritten during each iteration. A simple workaround is to nest the loop code in a function and immediately call it:

/* Bad version */
fns = new Array(5)
for (i in [0,1,2,3,4]) {
  fns[i] = function() { return(i); };  
}
vals = new Array(5)
for (j in [0,1,2,3,4]) {
  vals[j] = fns[j]()
}
/* vals = [4,4,4,4,4] */

/* Good version */
fns = new Array(5)
for (i in [0,1,2,3,4]) {
  (function(i) { fns[i] = function() { return(i) }})(i);
}
vals = new Array(5)
for (j in [0,1,2,3,4]) {
  vals[j] = fns[j]()
}
/* vals = [0,1,2,3,4] */

R does not exactly suffer from this issue, as the JavaScript work-around does not, well, work-around the bug:

fns <- vector("list", 5)
for (i in 1:5) {
  fns[[i]] <- (function(i) { return(function() { i })})(i)
}
lapply(fns, function(f) { f() })
# returns c(5,5,5,5,5)

After poking and prodding, I found a (bizarre) solution in the same vein:

fmaker <- function(i) { function(j) { i }}

fns <- vector("list", 5)
for (i in 1:5) {
  fns[[i]] <- fmaker(i)
  fns[[i]](0)
}
lapply(fns, function(f) { f() })
# returns c(1,2,3,4,5)  

The new version is significantly more verbose. The critical aspects are defining a maker function (you can’t just in-line that code) and applying the function to some dummy argument. Apparently, these are the necessary genuflections to R to make the calling environment sticky.

Imperative languages

There are several reasons why both R and JavaScript could be getting these scoping rules wrong. First, while both allow first class functions, they are not as frequently used as in some other languages. I may very well be the first user to test R on its ability to properly scope functions created in loops.

A second possibility may be more fundamental: R and JavaScript are imperative, C-style block languages. One writes a programs as a series of declarative statements: first do this, next do this, now do this. Languages that treat programs as transformations of data (and here I’m referring to Lisps specifically, as I have the most exposure to this family) do a very good job with scoping rules. In fact, writing your own Lisp is a fairly simple process, covered in Chapter 4 of SICP, and getting environments right does not seem an especially difficult task.

The apparent difficulties in getting scope correct are even greater impetus for doing work in and on Incanter. Combining R’s wealth of statistical tools with Clojure’s proper scoping rules would be an ideal combination. Perhaps the best work around for R bugs is to write the program in Clojure?

The R version used in this post was 2.11.0 (2010-04-22).

Update: I found another, more elegant, workaround.