Pipes

Pipes are operators that send what comes before the pipe to what comes after. There are many different pipes, and some packages that use their own. However, the vast majority of packages use the same pipe:

%>%

Here, we’ll focus on their use with the dplyr package, and the tidyverse more generally. Pipes are also utilized heavily in visualization.

Example:

mydf %>% 
  select(var1, var2) %>% 
  filter(var1 == 'Yes') %>% 
  summary()
Start with a data.frame %>% 
    select columns from it %>% 
    filter/subset it %>% 
    get a summary

Note that you should never type the pipe by hand. The keyboard shortcut is Ctrl/Cmd + Shft + M.

Using Variables as They are Created

One nice thing about pipelines is that we can use variables as soon as they are created, without having to break out separate objects/steps.

mydf %>%
  mutate(newvar1 = var1 + var2,
         newvar2 = newvar1 / var3) %>%
  summarise(newvar2avg = mean(newvar2))

Pipes for Visualization (more later)

The following provides a means to think about pipes for visualization. It’s just a generic example for now, but we’ll see more later.

basegraph %>% 
  points %>%
  lines %>%
  layout

The Dot

Most functions are not ‘pipe-aware’ by default, or at least, do not have data as their first argument as most functions in the tidyverse do, as well as others using tidystyle. In the following we try to send our data frame to lm for a regression.

mydf %>% 
  lm(y ~ x)  # error

Other pipes could potentially work in this situation, e.g. %$% in magrittr. But generally, when you come upon this, you can use a dot to represent the object before the pipe.

mydf %>% 
  lm(y ~ x, data=.)  # . == mydf

Flexibility

Piping is not just for data.frames. For example, the more general list objects can be used as well, and would be the primary object for the purrr family of functions we’ll discuss later.

As a final example, we’ll create a function in an abstract way with pipes.

  • The following starts with a character vector.
  • Sends it to a recursive function (named ..).
    • .. is created on-the-fly, and has a single argument (.).
  • After the function is created, it’s used on ., which represents the string before the pipe.
  • Result: pipes between the words2.
c('Ceci', "n'est", 'pas', 'une', 'pipe!') %>%
{
  .. <-  . %>%
    if (length(.) == 1)  .
    else paste(.[1], '%>%', ..(.[-1]))
  ..(.)
} 
[1] "Ceci %>% n'est %>% pas %>% une %>% pipe!"

Put that in your pipe and smoke it René Magritte!

Pipes Summary

Pipes are best used interactively, though you can use them within functions as well, and they are extremely useful for data exploration. Nowadays, more and more packages are being made that are ‘pipe-aware’, especially many visualization packages.

See the magrittr package for more types of pipes, and more detail on pipes is provided in these slides.


  1. That was a very complicated way to do this paste(c('Ceci', "n'est", 'pas', 'une', 'pipe!'), collapse=' %>% ').↩︎