Programming

Pipes

library(magrittr) # you don't need the entire tidyverse for pipes

Why is the pipe so useful?

Piping alternatives

Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
- Chilren’s poem

Define an object to represent little bunny Foo Foo:

foo_foo <- little_bunny()

Use a function for each key verb: hop(), scoop(), and bop().

Using this object and these verbs, there are (at least) four ways we could retell the story in code:

  1. Save each intermediate step as a new object.
  2. Overwrite the original object many times.
  3. Compose functions.
  4. Use the pipe.

Save each intermediate steps

foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
  • Simple.
  • Downside – you must name each intermediate element.
  • Many copies of your data may takes up a lot of memory (R manages this concern pretty well, though).

Overwrite the original

Instead of creating intermediate objects at each step, we could overwrite the original object:

foo_foo <- hop(foo_foo, through = forest)
foo_foo <- scoop(foo_foo, up = field_mice)
foo_foo <- bop(foo_foo, on = head)

Less typing (and less thinking about object names), but:

  1. Debugging is painful: if you make a mistake you’ll need to re-run the complete pipeline from the beginning.

  2. The repetition of the object being transformed (we’ve written foo_foo six times!) obscures what’s changing on each line.

Function composition

Abandon assignment and just string the function calls together:

bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ), 
  on = head
)

Disadvantages:

  1. You have to read from inside-out, from right-to-left.
  2. Arguments end up spread far apart.

Use the pipe

foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mice) %>%
  bop(on = head)

Advantages:

  1. It focusses on verbs, not nouns.
  2. Sequential: Foo Foo hops, then scoops, then bops.

Downside:

  • you need to be familiar with the pipe.

Behind the scenes:

magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object:

my_pipe <- function(.) {
  . <- hop(., through = forest)
  . <- scoop(., up = field_mice)
  bop(., on = head)
}
my_pipe(foo_foo)

When the pipes doesn’t work

  1. Functions that use the current environment: assign(), get(), load().

    # create a new variable with the given name in the current environment:
    assign("x", 10)
    x
    ## [1] 10
    "x" %>% assign(100)
    x
    ## [1] 10

    The pipe assigns it to a temporary environment used by %>%. If you do want to use assign with the pipe, you must be explicit about the environment:

    env <- environment()
    "x" %>% assign(100, envir = env)
    x
    ## [1] 100
  2. Functions that use lazy evaluation: tryCatch(), try(), suppressMessages(), suppressWarnings().

    tryCatch(stop("!"), error = function(e) "An error")
    ## [1] "An error"
    stop("!") %>% 
      tryCatch(error = function(e) "An error")
    ## Error in eval(lhs, parent, parent): !

    In R, function arguments are only computed when the function uses them, not prior to calling the function. The pipe computes each element in turn, so you can’t rely on this behaviour.

When not to use the pipe

  • Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names.

  • You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe.

  • You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.

Other tools from magrittr

  • T pipe: when call a function for its side-effects, which may not return anything.

    rnorm(100) %>%
      matrix(ncol = 2) %>%
      plot() %>%
      str()

    ##  NULL
    rnorm(100) %>%
      matrix(ncol = 2) %T>%
      plot() %>%
      str()

    ##  num [1:50, 1:2] -0.851 1.15 -0.245 0.72 0.397 ...

    %T% returns the left-hand side instead of the right-hand.

  • %$%: when working with functions that don’t have a data frame based API.

    mtcars %$%
      cor(disp, mpg)   # `cor()` requires vector inputs
    ## [1] -0.8475514
  • %<>% for assignment: instead of

    mtcars <- mtcars %>% 
      transform(cyl = cyl * 2)

    you may like

    mtcars %<>% transform(cyl = cyl * 2)