If you have been programming in R for a while, then you will have come across work that has been written by beginning R programmers and you may well have noticed the lack of functions in what is often a lengthy R script which is executed by sourcing it in R's REPL. Here are my recommendations to get started with functions in R.

"> R at Work - Why beginners do not use functions

Why beginners do not use functions

If you have been programming in R for a while, then you will have come across work that has been written by beginning R programmers and you may well have noticed the lack of functions in what is often a lengthy R script which is executed by sourcing it in R's REPL. Here are my recommendations to get started with functions in R.

Have you ever reviewed R source code written by a beginning programmer or by an analyst? I bet that this code was a single big file with R commands and no self-written functions or something pretty close to that. Okay, I am exaggerating a little here, but you get my point.

Beginning users of R will generally not have a background in programming and therefore the use of functions is either a foreign concept to them (with a seemingly steep learning curve) or they think that writing functions is too time consuming. This, of course, lasts until they get fed up with rewriting the same bits of code over and over.

This is all quite understandable as there is a lot to learn about functions: arguments and their defaults, function scope (i.e., environments), return values, etc.

So my recommendation is always to start writing functions as soon as possible. That way, creating and using your own functions will become part of your muscle memory. And you do not have to create your own R package to use functions! You can write your functions at the top of your script or you can collect them in a separate functions.R file and do a source("functions.R") at the top of your script to load them into R's global environment.

You can even add unit tests right after the definition of your function: just write a few tests that will throw an error (using either stop() or stopifnot() in R's base package or even the assertError() and assertWarning() functions from the tools package to check for errors and warnings)) such that your script stops before it performs the rest of its commands. To avoid these unit tests cluttering your (global) environment, use the local() function to create a temporary environment like so:

fac2num <- function(x) {
   if (!is.factor(x)) return(x)

   as.numeric(levels(x))[x]
}

# unit tests for 'fac2num()`:
local({
   x <- c("1", "2", "c")
   # 'x' is not (yet) a factor:
   stopifnot(identical(fac2num(x), x))

   y <- as.factor(x)
   # "c" will generate a warning:
   tools::assertWarning(fac2num(y))

   z <- factor("1")
   stopifnot(fac2num(z) == 1)
})

The somewhat contrived example above defines a function fac2num() which can be used to convert a factor to a numeric vector. The interesting part is the block of unit tests within the local() function right after the function definition. Any variables, like x, y, and z in this example, created within are gone immediately after the local() function exits.

Happy coding!

— M