Tidyverse Style Guide (Part 1)
This is part 1 of my notes from the tidyverse style guide.
Analyses
1. Files
names are meaningful and use letters, numbers,
_, and/or-.avoid special characters in files names.
if more than 10 files then left pad with a zero (i.e. 00, 01, 02, 03,…)
rename files instead of attempting 02a, 02b, and so on.
pay attention to capitalization.
use
-and=to break up file into readable chunks.load all packages at once in the beginning of the file.
# Load data ---------------------------
# Plot data ---------------------------
2. Syntax
Object names
Variables and function names should use lovercase letters, numbers and
_(no camel case).Base R uses dots in function and class names (
data.frame).Variable names should be names and nouns.
Function names should be verbs.
Names should be concise and meaningful.
Avoid re-using names of common function and variables.
Spacing
Put a space after a common, never before.
Do not put spaces inside or outside parentheses for regular functions :
mean(x, na.rm = TRUE).Put a space before and after () when using if, for, or while.
# if (debug) {
# show(x)
# }
Place a space after
()used for function arguments :function(x) {}.The embracing operator,
{{ }}, should always have inner spaces to help emphasize its special behavior.Most infix operators (
==,+,-,<-, etc.) should always be surrounded by spaces.Exceptions being:
The operators with high precedence:
::,:::,$,@,[,[[,^,unary -,unary +, and:.Single-sided formulas when the right-hand side is a single identifier:
call(!!xyz).When used in tidy evaluation
!!(bang-bang) and!!!(bang-bang-bang) (because have precedence equivalent to unary-/+).The helper operator:
package?stats.
adding extra spaces is ok if it improves alignment of
=or<-.
Function Calls
When you call a function omit the names of data arguemnts (such as
x=because it is used so commonly).If you override the default value of an argument, use the full name.
Avoid assignments in function calls.
- Exception: function that capture side-effects:
output <- capture.output(x <- f()).
- Exception: function that capture side-effects:
Control flow
For
{}brackets:{should be the last character on the line.Contents should be indented by two spaces.
}should be the first character on the line.Else should be on the same line as
}(if used).
{}define the most important hierarchy of R code.Always use
&&and||inside an if clause and never&and|.If you want to rewrite a simple but lengthy if block:
if (x > 10) {
message <- "big"
} else {
message <- "small"
}
Just write it all on one line:
message <- if (x > 10) "big" else "small"
it is okay to drop curly braces for very simple statements that fit on one line as long as they dont have side-effects.
Function calls that affect control flow (like return(), stop() or continue) should always go in their own {} block.
Avoid implicit type coercion (e.g. from numeric to logical) in if statements:
# Good
if (length(x) > 0) {
# do something
}
## NULL
# Bad
if (length(x)) {
# do something
}
## NULL
Avoid postion-based
switch()statements.Each element should go on its own line.
Provided a fall-through erro, unless you have previously validate the input.
switch(x,
a = 0,
b = 1,
c = 2,
stop("Unknown `x`", call. = FALSE)
)
## [1] 0
Long lines
- Limit code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font.
# # Good
# do_something_very_complicated(
# something = "that",
# requires = many,
# arguments = "some of which may be long"
# )
#
# # Bad
# do_something_very_complicated("that", requires, many, arguments,
# "some of which may be long"
# )
- You may also place several arguments on the same line if they are closely related to each other, e.g., strings in calls to paste() or stop().
Semicolons
- Don’t put ; at the end of a line, and don’t use ; to put multiple commands on one line.
Assignment
- Use <-, not =, for assignment.
Data
Use
", not', for quoting text. The only exception is when the text already contains double quotes and no single quotes.Prefer TRUE and FALSE over T and F.
3. Functions
Naming
- Strive to use verbs for function names.
Long lines
- Prefer function-indent style to double-indent style when it fits.
# Function-indent --------------------------------------
long_function_name <- function(a = "a long argument",
b = "another argument",
c = "another long argument") {
# As usual code is indented by two spaces.
}
# Double-indent ----------------------------------------
long_function_name <- function(
a = "a long argument",
b = "another argument",
c = "another long argument") {
# As usual code is indented by two spaces.
}
- If a function argument can’t fit on a single line, this is a sign you should rework the argument to keep it short and sweet.
return()
- Only use return() for early returns. Otherwise, rely on R to return the result of the last evaluated expression.
# Good
find_abs <- function(x) {
if (x > 0) {
return(x)
}
x * -1
}
add_two <- function(x, y) {
x + y
}
# Bad
add_two <- function(x, y) {
return(x + y)
}
Return statments should always be on their own line because they have important effects on the control flow.
If your function is called primarily for its side-effects (like printing, plotting, or saving to disk), it should return the first argument invisibly. This makes it possible to use the function as part of a pipe.
Comments
In code, use comments to explain the “why” not the “what” or “how”.
Comments should be in sentence case, and only end with a full stop if they contain at least two sentences.
# Good
# Objects like data frames are treated as leaves
# Do not use `is.list()`. Objects like data frames must be treated
# as leaves.
# Bad
# objects like data frames are treated as leaves
# Objects like data frames are treated as leaves.
4. Pipes
Introduction
Reserve pipes for a sequence of steps applied to one primary object.
Avoid using pipes when there are meaningful intermediate objects that could be given informative names.
Whitespace
%>%should always have a space before it, and should usually be followed by a new line. After the first step, each line should be indented by two spaces.
Long lines
- If the arguments to a function don’t all fit on one line, put each argument on its own line and indent.
Short pipes
A one-step pipe can stay on one line, but unless you plan to expand it later on, you should consider rewriting it to a regular function call.
Sometimes it’s useful to include a short pipe as an argument to a function in a longer pipe. Carefully consider whether the code is more readable with a short inline pipe (which doesn’t require a lookup elsewhere) or if it’s better to move the code outside the pipe and give it an evocative name.
# Good
# x %>%
# select(a, b, w) %>%
# left_join(y %>% select(a, b, v), by = c("a", "b"))
#
# # Better
# x_join <- x %>% select(a, b, w)
# y_join <- y %>% select(a, b, v)
# left_join(x_join, y_join, by = c("a", "b"))
No arguments
- magrittr allows you to omit () on functions that don’t have arguments. Avoid this feature.
Assigment
There are three acceptable forms of assignment:
- Variable name and assignment on separate lines:
iris_long <-
iris %>%
gather(measure, value, -Species) %>%
arrange(-value)
- Variable name and assignment on the same line:
iris_long <- iris %>%
gather(measure, value, -Species) %>%
arrange(-value)
- Assignment at the end of the pipe with
->:
iris %>%
gather(measure, value, -Species) %>%
arrange(-value) ->
iris_long
- The magrittr package provides the
%<>%operator as a shortcut for modifying an object in place. Avoid this operator.
5. ggplot2
Introduction
- Styling suggestions for
+used to separate ggplot2 layers are very similar to those for%>%in pipelines.
Whitespace
+should always have a space before it, and should be followed by a new line.If you are creating a ggplot off of a dplyr pipeline, there should only be one level of indentation.
Long lines
If the arguments to a ggplot2 layer don’t all fit on one line, put each argument on its own line and indent.
ggplot2 allows you to do data manipulation, such as filtering or slicing, within the data argument. Avoid this, and instead do the data manipulation in a pipeline before starting plotting.
Comments
Each line of a comment should begin with the comment symbol and a single space.
In data analysis code, use comments to record important findings and analysis decisions.
If you need comments to explain what your code is doing, consider rewriting your code to be clearer.
If you discover that you have more comments than code, consider switching to R Markdown.