Tidyverse Style Guide (Part 1)

29/3/2022 8-minute read

This is part 1 of my notes from the tidyverse style guide.

Analyses

1. Files

  • names are meaningful and use letters, numbers, _, and/or -.

  • avoid special characters in files names.

  • if more than 10 files then left pad with a zero (i.e. 00, 01, 02, 03,…)

  • rename files instead of attempting 02a, 02b, and so on.

  • pay attention to capitalization.

  • use - and = to break up file into readable chunks.

  • load all packages at once in the beginning of the file.

# Load data ---------------------------

# Plot data ---------------------------

2. Syntax

Object names

  • Variables and function names should use lovercase letters, numbers and _ (no camel case).

  • Base R uses dots in function and class names (data.frame).

  • Variable names should be names and nouns.

  • Function names should be verbs.

  • Names should be concise and meaningful.

  • Avoid re-using names of common function and variables.

Spacing

  • Put a space after a common, never before.

  • Do not put spaces inside or outside parentheses for regular functions : mean(x, na.rm = TRUE).

  • Put a space before and after () when using if, for, or while.

# if (debug) {
#   show(x)
# }
  • Place a space after () used for function arguments : function(x) {}.

  • The embracing operator, {{ }}, should always have inner spaces to help emphasize its special behavior.

  • Most infix operators (==, +, -, <-, etc.) should always be surrounded by spaces.

  • Exceptions being:

    • The operators with high precedence: ::, :::, $, @, [, [[, ^, unary -, unary +, and :.

    • Single-sided formulas when the right-hand side is a single identifier: call(!!xyz).

    • When used in tidy evaluation !! (bang-bang) and !!! (bang-bang-bang) (because have precedence equivalent to unary -/+).

    • The helper operator: package?stats.

  • adding extra spaces is ok if it improves alignment of = or <-.

Function Calls

  • When you call a function omit the names of data arguemnts (such as x= because it is used so commonly).

  • If you override the default value of an argument, use the full name.

  • Avoid assignments in function calls.

    • Exception: function that capture side-effects: output <- capture.output(x <- f()).

Control flow

  • For {} brackets:

    • { should be the last character on the line.

    • Contents should be indented by two spaces.

    • } should be the first character on the line.

    • Else should be on the same line as } (if used).

  • {} define the most important hierarchy of R code.

  • Always use && and || inside an if clause and never & and |.

  • If you want to rewrite a simple but lengthy if block:

if (x > 10) {
  message <- "big"
} else {
  message <- "small"
}

Just write it all on one line:

message <- if (x > 10) "big" else "small"
  • it is okay to drop curly braces for very simple statements that fit on one line as long as they dont have side-effects.

  • Function calls that affect control flow (like return(), stop() or continue) should always go in their own {} block.

  • Avoid implicit type coercion (e.g. from numeric to logical) in if statements:

# Good
if (length(x) > 0) {
  # do something
}
## NULL
# Bad
if (length(x)) {
  # do something
}
## NULL
  • Avoid postion-based switch() statements.

  • Each element should go on its own line.

  • Provided a fall-through erro, unless you have previously validate the input.

switch(x, 
  a = 0,
  b = 1, 
  c = 2,
  stop("Unknown `x`", call. = FALSE)
)
## [1] 0

Long lines

  • Limit code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font.
# # Good
# do_something_very_complicated(
#   something = "that",
#   requires = many,
#   arguments = "some of which may be long"
# )
# 
# # Bad
# do_something_very_complicated("that", requires, many, arguments,
#                               "some of which may be long"
#                               )
  • You may also place several arguments on the same line if they are closely related to each other, e.g., strings in calls to paste() or stop().

Semicolons

  • Don’t put ; at the end of a line, and don’t use ; to put multiple commands on one line.

Assignment

  • Use <-, not =, for assignment.

Data

  • Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

  • Prefer TRUE and FALSE over T and F.

Comments

  • Each line of a comment should begin with the comment symbol and a single space.

  • In data analysis code, use comments to record important findings and analysis decisions.

  • If you need comments to explain what your code is doing, consider rewriting your code to be clearer.

  • If you discover that you have more comments than code, consider switching to R Markdown.

3. Functions

Naming

  • Strive to use verbs for function names.

Long lines

  • Prefer function-indent style to double-indent style when it fits.
# Function-indent --------------------------------------
long_function_name <- function(a = "a long argument",
                               b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

# Double-indent ----------------------------------------
long_function_name <- function(
    a = "a long argument",
    b = "another argument",
    c = "another long argument") {
  # As usual code is indented by two spaces.
}
  • If a function argument can’t fit on a single line, this is a sign you should rework the argument to keep it short and sweet.

return()

  • Only use return() for early returns. Otherwise, rely on R to return the result of the last evaluated expression.
# Good
find_abs <- function(x) {
  if (x > 0) {
    return(x)
  }
  x * -1
}
add_two <- function(x, y) {
  x + y
}

# Bad
add_two <- function(x, y) {
  return(x + y)
}
  • Return statments should always be on their own line because they have important effects on the control flow.

  • If your function is called primarily for its side-effects (like printing, plotting, or saving to disk), it should return the first argument invisibly. This makes it possible to use the function as part of a pipe.

Comments

  • In code, use comments to explain the “why” not the “what” or “how”.

  • Comments should be in sentence case, and only end with a full stop if they contain at least two sentences.

# Good

# Objects like data frames are treated as leaves

# Do not use `is.list()`. Objects like data frames must be treated
# as leaves.

# Bad

# objects like data frames are treated as leaves

# Objects like data frames are treated as leaves.

4. Pipes

Introduction

  • Reserve pipes for a sequence of steps applied to one primary object.

  • Avoid using pipes when there are meaningful intermediate objects that could be given informative names.

Whitespace

  • %>% should always have a space before it, and should usually be followed by a new line. After the first step, each line should be indented by two spaces.

Long lines

  • If the arguments to a function don’t all fit on one line, put each argument on its own line and indent.

Short pipes

  • A one-step pipe can stay on one line, but unless you plan to expand it later on, you should consider rewriting it to a regular function call.

  • Sometimes it’s useful to include a short pipe as an argument to a function in a longer pipe. Carefully consider whether the code is more readable with a short inline pipe (which doesn’t require a lookup elsewhere) or if it’s better to move the code outside the pipe and give it an evocative name.

# Good
# x %>%
#   select(a, b, w) %>%
#   left_join(y %>% select(a, b, v), by = c("a", "b"))
# 
# # Better
# x_join <- x %>% select(a, b, w)
# y_join <- y %>% select(a, b, v)
# left_join(x_join, y_join, by = c("a", "b"))

No arguments

  • magrittr allows you to omit () on functions that don’t have arguments. Avoid this feature.

Assigment

There are three acceptable forms of assignment:

  • Variable name and assignment on separate lines:
iris_long <-
  iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)
  • Variable name and assignment on the same line:
iris_long <- iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value)
  • Assignment at the end of the pipe with ->:
iris %>%
  gather(measure, value, -Species) %>%
  arrange(-value) ->
  iris_long
  • The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.

5. ggplot2

Introduction

  • Styling suggestions for + used to separate ggplot2 layers are very similar to those for %>% in pipelines.

Whitespace

  • + should always have a space before it, and should be followed by a new line.

  • If you are creating a ggplot off of a dplyr pipeline, there should only be one level of indentation.

Long lines

  • If the arguments to a ggplot2 layer don’t all fit on one line, put each argument on its own line and indent.

  • ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument. Avoid this, and instead do the data manipulation in a pipeline before starting plotting.