Introduction

An R Markdown (.Rmd) file is a record of your research. It contains the code that a scientist needs to reproduce your work along with the narration that a reader needs to understand your work.

You can easily rerun the code in an R Markdown file to reproduce your work and export the results as a nicely formatted report in a variety of formats, including html and pdf.

Following is an example of R Markdown file:

---
title: 'SC18 Lab Session: R Markdown'
author: "Seokjin Han @ Bayesian Statistics Lab"
date: 'September 12, 2018'
output:
  html_document:
    toc: true
    df_print: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
  fig.width = 5, fig.height = 3.5, fig.align = 'center',
  cache = TRUE)
```

# Introduction

An R Markdown (`.Rmd`) file is a record of your research. It contains the code
that a scientist needs to reproduce your work along with the narration that a
reader needs to understand your work.

You can easily rerun the code in an R Markdown file to reproduce your work and
export the results as a nicely formatted report in a variety of formats,
including `html` and `pdf`.

Following is an example of R Markdown file:
```{r echo = FALSE, comment = ""}
cat(htmltools::includeText("05-rmd.Rmd"))
```

# `.Rmd` Structure

- **Header** (Optional): Various render options written in YAML
  format. Surrounded by `---` and `---`.
- **Text**: Narration formatted with Markdown, mixed with code chunks.
- **R Code Chunks**: Surrounded by `` ```{r} `` and `` ``` ``.

In RStudio, click File > New File > R Markdown, and you can get a new Rmd file with some default contents. Click “Knit” or press Ctrl + Shift + K to produce a report in HTML format.

R Markdown file consists of following components:

Markdown

Markdown is a lightweight markup language with plain text formatting syntax. It is designed so that it can be converted to HTML and many other formats using a tool by the same name.

Headers

# 1st Level Header

## 2nd Level Header

### 3rd Level Header

Alternative Style for 1st Level Header
======================================

Alternative Style for 1st Level Header
--------------------------------------

Emphasis

*italic* (or _italic_)
**bold** (or __bold__)
`code`
superscript^2^ and subscript~2~

Lists

*   Bulleted list item 1

*   Item 2

    * Item 2a

    * Item 2b

1.  Numbered list item 1

1.  Item 2. The numbers are incremented automatically in the output.

Tables

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell

R Code Chunks

RStudio shortcuts

In RStudio, Ctrl + Alt + I inserts a new code chunk and Ctrl + Shift + Enter runs all code in the chunk.

Chunk names

Chunks can be given an optional name: ```{r by-name}.

Chunk options

You can pass options to chunk as follows: ```{r, key1=value1, key2=value2}.

List of some chunk options:

  • eval = FALSE prevents code from being evaluated.
  • include = FALSE runs the code, but doesn’t show the code or results in the final document.
  • echo = FALSE prevents code, but not the results from appearing in the finished file.
  • message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.
  • results = 'hide' hides printed output; fig.show = 'hide' hides plots.
  • error = TRUE causes the render to continue even if code returns an error.
  • cache = TRUE will save the output of the chunk on disk. On subsequent runs, knitr will check to see if the code has changed, and if it hasn’t, it will reuse the cached results.
Option Run code Show code Output Plots Messages Warnings
eval = FALSE x x x x x
include = FALSE x x x x x
echo = FALSE x
results = "hide" x
fig.show = "hide" x
message = FALSE x
warning = FALSE x

Global options

You can change the default chunk options via knitr::opts_chunk$set():

knitr::opts_chunk$set(echo = FALSE)

Notes on cache=TRUE option

The caching system must be used with care, because by default it is based on the code only, not its dependencies. For example, here the processed_data chunk depends on the raw_data chunk:

```{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE}
processed_data <- rawdata %>%
  filter(!is.na(import_var)) %>%
  mutate(new_variable = complicated_transformation(x, y, z))
```

Caching the processed_data chunk means that it will get re-run if the dplyr pipeline is changed, but it won’t get rerun if the read_csv() call changes. You can avoid that problem with the dependson chunk option. dependson should contain a character vector of every chunk that the cached chunk depends on.

Also, note that the chunks won’t update if a_very_large_file.csv changes, because knitr caching only tracks changes within the .Rmd file. If you want to also track changes to that file you can use the cache.extra option with file.info().

```{r raw_data, cache.extra = file.info("a_very_large_file.csv")}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE, dependson = "raw_data"}
processed_data <- rawdata %>% 
  filter(!is.na(import_var)) %>% 
  mutate(new_variable = complicated_transformation(x, y, z))
```

As your caching strategies get progressively more complicated, it’s a good idea to regularly clear out all your caches with knitr::clean_cache().

Formatting table

mtcars[1:5, ]
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
knitr::kable(mtcars[1:5, ], caption = "Formatting table via knitr::kable")
Formatting table via knitr::kable
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Inline code chunks

`r ` will embed the result of R code directly into the text. For instance,

> We have data about `r nrow(cars)` cars.

will render to:

We have data about 50 cars.

YAML header

R Markdown uses YAML header to control many details of the output.

Parameters

R Markdown documents can include one or more parameters whose values can be set when you render the report. To declare one or more parameters, use the params field. Parameters are available within the code chunks as a read-only list named params.

---
output: html_document
params:
  foo: "suv"
---
```{r}
library(ggplot2)
library(dplyr)
class <- mpg %>% filter(class == params$foo)
```

You can write atomic vectors directly into the YAML header. You can also run arbitrary R expressions by prefacing the parameter value with !r.

params:
  start: !r lubridate::ymd("2015-01-01")
  snapshot: !r lubridate::ymd_hms("2015-01-01 12:30:00")

Bibliographies and citations

First, you need to specify a bibliography file using the bibliography field in your file’s header. The field should contain a path from the directory that contains your R Markdown file to the file that contains the bibliography file. Common bibliography formats including BibTeX is supported.

---
bibliography: refs.bib
---
Separate multiple citations with a `;`:
Blah blah [@Ghosh2003a; @Ghosal2000a].

You can add arbitrary comments inside the square brackets:
Blah blah [see @Ghosal2000a, pp. 33-35; also @Ghosh2003a, ch. 1].

Remove the square brackets to create an in-text citation:
@Ghosh2003a says blah, or @Ghosal2000a [p. 33] says blah.

Add a `-` before the citation to suppress the author's name:
Ghosal et al. says blah [-@Ghosh2003a].

### References

To customize the citation and bibliography style, you can specify CSL file in the csl field.


Acknowledgment All contents in this note is based on the book “R for Data Science”, written by Grolemund & Wickham.