library(magrittr) # you don't need the entire tidyverse for pipes
Why is the pipe so useful?
Little bunny Foo Foo
Went hopping through the forest
Scooping up the field mice
And bopping them on the head
- Chilren’s poem
Define an object to represent little bunny Foo Foo:
foo_foo <- little_bunny()
Use a function for each key verb: hop()
, scoop()
, and bop()
.
Using this object and these verbs, there are (at least) four ways we could retell the story in code:
foo_foo_1 <- hop(foo_foo, through = forest)
foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
Instead of creating intermediate objects at each step, we could overwrite the original object:
foo_foo <- hop(foo_foo, through = forest)
foo_foo <- scoop(foo_foo, up = field_mice)
foo_foo <- bop(foo_foo, on = head)
Less typing (and less thinking about object names), but:
Debugging is painful: if you make a mistake you’ll need to re-run the complete pipeline from the beginning.
The repetition of the object being transformed (we’ve written foo_foo
six times!) obscures what’s changing on each line.
Abandon assignment and just string the function calls together:
bop(
scoop(
hop(foo_foo, through = forest),
up = field_mice
),
on = head
)
Disadvantages:
foo_foo %>%
hop(through = forest) %>%
scoop(up = field_mice) %>%
bop(on = head)
Advantages:
Downside:
Behind the scenes:
magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object:
my_pipe <- function(.) {
. <- hop(., through = forest)
. <- scoop(., up = field_mice)
bop(., on = head)
}
my_pipe(foo_foo)
Functions that use the current environment: assign()
, get()
, load()
.
# create a new variable with the given name in the current environment:
assign("x", 10)
x
## [1] 10
"x" %>% assign(100)
x
## [1] 10
The pipe assigns it to a temporary environment used by %>%
. If you do want to use assign with the pipe, you must be explicit about the environment:
env <- environment()
"x" %>% assign(100, envir = env)
x
## [1] 100
Functions that use lazy evaluation: tryCatch()
, try()
, suppressMessages()
, suppressWarnings()
.
tryCatch(stop("!"), error = function(e) "An error")
## [1] "An error"
stop("!") %>%
tryCatch(error = function(e) "An error")
## Error in eval(lhs, parent, parent): !
In R, function arguments are only computed when the function uses them, not prior to calling the function. The pipe computes each element in turn, so you can’t rely on this behaviour.
Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names.
You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe.
You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.
T pipe: when call a function for its side-effects, which may not return anything.
rnorm(100) %>%
matrix(ncol = 2) %>%
plot() %>%
str()
## NULL
rnorm(100) %>%
matrix(ncol = 2) %T>%
plot() %>%
str()
## num [1:50, 1:2] -0.851 1.15 -0.245 0.72 0.397 ...
%T%
returns the left-hand side instead of the right-hand.
%$%
: when working with functions that don’t have a data frame based API.
mtcars %$%
cor(disp, mpg) # `cor()` requires vector inputs
## [1] -0.8475514
%<>%
for assignment: instead of
mtcars <- mtcars %>%
transform(cyl = cyl * 2)
you may like
mtcars %<>% transform(cyl = cyl * 2)