Pipes

library(magrittr) # you don't need the entire tidyverse for pipes

Why is the pipe so useful?

Piping alternatives

Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
- Chilren’s poem

Define an object to represent little bunny Foo Foo:

foo_foo <- little_bunny()

Use a function for each key verb: hop(), scoop(), and bop().

Using this object and these verbs, there are (at least) four ways we could retell the story in code:

foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)

Simple.
Downside – you must name each intermediate element.
Many copies of your data may takes up a lot of memory (R manages this concern pretty well, though).

Instead of creating intermediate objects at each step, we could overwrite the original object:

foo_foo <- hop(foo_foo, through = forest)
foo_foo <- scoop(foo_foo, up = field_mice)
foo_foo <- bop(foo_foo, on = head)

Less typing (and less thinking about object names), but:

Debugging is painful: if you make a mistake you’ll need to re-run the complete pipeline from the beginning.
The repetition of the object being transformed (we’ve written foo_foo six times!) obscures what’s changing on each line.

Abandon assignment and just string the function calls together:

bop(
  scoop(
    hop(foo_foo, through = forest),
    up = field_mice
  ), 
  on = head
)

Disadvantages:

foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mice) %>%
  bop(on = head)

Advantages:

Downside:

Behind the scenes:

magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object:

my_pipe <- function(.) {
  . <- hop(., through = forest)
  . <- scoop(., up = field_mice)
  bop(., on = head)
}
my_pipe(foo_foo)

Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names.
You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe.
You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.

T pipe: when call a function for its side-effects, which may not return anything.

rnorm(100) %>%
  matrix(ncol = 2) %>%
  plot() %>%
  str()

##  NULL

rnorm(100) %>%
  matrix(ncol = 2) %T>%
  plot() %>%
  str()

##  num [1:50, 1:2] -0.851 1.15 -0.245 0.72 0.397 ...

%T% returns the left-hand side instead of the right-hand.

%$%: when working with functions that don’t have a data frame based API.

mtcars %$%
  cor(disp, mpg)   # `cor()` requires vector inputs

## [1] -0.8475514

%<>% for assignment: instead of

mtcars <- mtcars %>% 
  transform(cyl = cyl * 2)