Vectors are the objects that underlie tibbles and data frames.
 
The hierarchy of R’s vector types
Atomic vectors have 6 types: logical, integer, double, character, complex, and raw. Integer and double vectors are collectively known as numeric vectors.
Lists are sometimes called recursive vectors, because lists can contain other lists.
Atomic vectors are homogeneous, while lists can be heterogeneous.
NULLNULL is often used to represent the absence of a vectorNA is used to represent the absence of a value in a vectorNULL typically behaves like a vector of length 0.Type:
typeof(letters)## [1] "character"typeof(1:10)## [1] "integer"Length:
x <- list("a", "b", 1:10)
length(x)## [1] 3Take only three possible values: FALSE, TRUE, and NA.
Usually constructed with comparison operators (see Lecture 3).
Manual creation:
1:10 %% 3 == 0##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSEc(TRUE, TRUE, FALSE, NA)## [1]  TRUE  TRUE FALSE    NAIn R, numbers are doubles by default.
typeof(1)## [1] "double"To make an integer, place an L after the number:
typeof(1L)## [1] "integer"1.5L  # no effect## [1] 1.5```r
x <- sqrt(2) ^ 2  # recurring example
x
```
```
## [1] 2
```
```r
x - 2
```
```
## [1] 4.440892e-16
```
Instead of comparing floating point numbers using `==`, you should use `dplyr::near()` for some numerical tolerance.Special values
NANA, NaN, Inf and -Inf.c(-1, 0, 1) / 0## [1] -Inf  NaN  InfAgain avoid using == to check for these other special values. Instead use is.finite(), is.infinite(), and is.nan():
| 0 | Inf | NA | NaN | |
|---|---|---|---|---|
| is.finite() | O | |||
| is.infinite() | O | |||
| is.na() | O | O | ||
| is.nan() | O | 
Each element of a character vector is a string, and a string can contain an arbitrary amount of data.
Each unique string is only stored in memory _once__
Every use of the string points to that representation.
This reduces the amount of memory needed by duplicated strings.
x <- "This is a reasonably long string."
pryr::object_size(x)## Registered S3 method overwritten by 'pryr':
##   method      from
##   print.bytes Rcpp## 152 By <- rep(x, 1000)
pryr::object_size(y)## 8.14 kBy doesn’t take up 1,000x as much memory as x!
Two ways:
Explicit coercion: by calling as.logical(), as.integer(), as.double(), or as.character(), etc.
Implicit coercion: happens when you use a vector in a specific context that expects a certain type of vector. Examples:
From a logical vector to a numeric vector: case TRUE is converted to 1 and FALSE converted to 0:
x <- sample(20, 100, replace = TRUE)
y <- x > 10
sum(y)  # how many are greater than 10?## [1] 50mean(y) # what proportion are greater than 10?## [1] 0.5Implicit coercion from integer to logical:
if (length(x)) {
# do something
}Be explicit: use length(x) > 0.
Vector containing multiple types: the most complex type always wins.
typeof(c(TRUE, 1L))## [1] "integer"typeof(c(1L, 1.5))## [1] "double"typeof(c(1.5, "a"))## [1] "character"An atomic vector can__not__ have a mix of different types!
| lgl | int | dbl | chr | list | |
|---|---|---|---|---|---|
| is_logical() | O | ||||
| is_integer() | O | ||||
| is_double() | O | ||||
| is_numeric() | O | O | |||
| is_character() | O | ||||
| is_atomic() | O | O | O | O | |
| is_list() | O | ||||
| is_vector() | O | O | O | O | O | 
Vector recycling: implicit coercion of the length of vectors
Most intuitive:
sample(10) + 100##  [1] 102 107 110 103 108 109 105 106 104 101runif(10) > 0.5##  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUEIn R, basic mathematical operations work with vectors.
Unintuitive:
1:10 + 1:2##  [1]  2  4  4  6  6  8  8 10 10 12Here, R will expand the shortest vector to the same length as the longest, so called recycling.
When the length of the longer is not an integer multiple of the length of the shorter:
1:10 + 1:3## Warning in 1:10 + 1:3: longer object length is not a multiple of shorter object
## length##  [1]  2  4  6  5  7  9  8 10 12 11The tidyverse way: if you do want to recycle, you’ll need to do it yourself with rep():
tibble(x = 1:4, y = 1:2)## Error: Tibble columns must have compatible sizes.
## * Size 4: Existing data.
## * Size 2: Column `y`.
## ℹ Only values of size one are recycled.tibble(x = 1:4, y = rep(1:2, 2))tibble(x = 1:4, y = rep(1:2, each = 2))tibble(x=1:4, y=1)  # this is allowedAll types of vectors can be named:
c(x = 1, y = 2, z = 4)## x y z 
## 1 2 4Or with purrr::set_names():
set_names(1:3, c("a", "b", "c"))## a b c 
## 1 2 3[: subsetting function for vectors, e.g., x[a] cf. dplyr::filter() for tibbles
A numeric vector containing only integers. The integers must either be all positive, all negative, or zero.
x <- c("one", "two", "three", "four", "five")
x[c(3, 2, 5)]## [1] "three" "two"   "five"By repeating a position, you can actually make a longer output than input:
x[c(1, 1, 5, 5, 5, 2)]## [1] "one"  "one"  "five" "five" "five" "two"Negative values drop the elements at the specified positions:
x[c(-1, -3, -5)]## [1] "two"  "four"It’s an error to mix positive and negative values:
x[c(1, -1)]## Error in x[c(1, -1)]: only 0's may be mixed with negative subscriptsSubsetting with zero:
x[0]## character(0)Subsetting with a logical vector:
x <- c(10, 3, NA, 5, 8, 1, NA)
# All non-missing values of x
x[!is.na(x)]## [1] 10  3  5  8  1# All even (or missing!) values of x
x[x %% 2 == 0]## [1] 10 NA  8 NASubsetting a named vector:
x <- c(abc = 1, def = 2, xyz = 5)
x[c("xyz", "def")]## xyz def 
##   5   2x[c("xyz", "def", "xyz")]## xyz def xyz 
##   5   2   5Subsetting nothing: x[] returns the complete x. Useful when subsetting matrices:
x <- c(1, 2, 3)
x[]## [1] 1 2 3y <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2 )
y[1,]## [1] 1 3 5y[,-1]##      [,1] [,2]
## [1,]    3    5
## [2,]    4    6x <- list(1, 2, 3)
x## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3str(x)  # `str` for structure## List of 3
##  $ : num 1
##  $ : num 2
##  $ : num 3x_named <- list(a = 1, b = 2, c = 3)
str(x_named)## List of 3
##  $ a: num 1
##  $ b: num 2
##  $ c: num 3Unlike atomic vectors, list() can contain a mix of objects:
y <- list("a", 1L, 1.5, TRUE)
str(y)## List of 4
##  $ : chr "a"
##  $ : int 1
##  $ : num 1.5
##  $ : logi TRUELists can even contain other lists!
x1 <- list(c(1, 2), c(3, 4))
x2 <- list(list(1, 2), list(3, 4))
x3 <- list(1, list(2, list(3)))
Visualisation:
Lists have rounded corners. Atomic vectors have square corners.
Children are drawn inside their parent, and have a slightly darker background to make it easier to see the hierarchy.
The orientation of the children (i.e. rows or columns) isn’t important, so I’ll pick a row or column orientation to either save space or illustrate an important property in the example.
a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))[ extracts a sub-list. The result will always be a list.```r
str(a[1:2])
```
```
## List of 2
##  $ a: int [1:3] 1 2 3
##  $ b: chr "a string"
```
```r
str(a[4])
```
```
## List of 1
##  $ d:List of 2
##   ..$ : num -1
##   ..$ : num -5
```
Like with vectors, you can subset with a logical, integer, or character vector.[[ extracts a single component from a list. It removes a level of hierarchy from the list.```r
str(a[[1]])
```
```
##  int [1:3] 1 2 3
```
```r
str(a[[4]])
```
```
## List of 2
##  $ : num -1
##  $ : num -5
```$ is a shorthand for extracting named elements of a list. It works similarly to [[ except that you don’t need to use quotes.```r
a$a
```
```
## [1] 1 2 3
```
```r
a[["a"]]
```
```
## [1] 1 2 3
```[ vs [[ 
Subsetting a list, visually.
Any vector can contain arbitrary additional metadata through its attributes. Attributes are like a named list of vectors that can be attached to any object.
x <- 1:10
attr(x, "greeting")  # get an individual attribute## NULLattr(x, "greeting") <- "Hi!"  # set an individual attribute
attr(x, "farewell") <- "Bye!" # set an individual attribute
attributes(x)  # get all at once## $greeting
## [1] "Hi!"
## 
## $farewell
## [1] "Bye!"Fundamental attributes:
Class controls how generic functions work
as.Date## function (x, ...) 
## UseMethod("as.Date")
## <bytecode: 0x7ff94bb22068>
## <environment: namespace:base>The call to “UseMethod” means that this is a generic function, and it will call a specific method, a function, based on the class of the first argument.
All methods are functions; not all functions are methods.
List all the methods for a generic with methods():
methods("as.Date")## [1] as.Date.character   as.Date.default     as.Date.factor     
## [4] as.Date.numeric     as.Date.POSIXct     as.Date.POSIXlt    
## [7] as.Date.vctrs_sclr* as.Date.vctrs_vctr*
## see '?methods' for accessing help and source codeIf x is a character vector, as.Date() will call as.Date.character(); if it’s a factor, it’ll call as.Date.factor().
Specific implementation of a method:
getS3method("as.Date", "default")## function (x, ...) 
## {
##     if (inherits(x, "Date")) 
##         x
##     else if (is.null(x)) 
##         .Date(numeric())
##     else if (is.logical(x) && all(is.na(x))) 
##         .Date(as.numeric(x))
##     else stop(gettextf("do not know how to convert '%s' to class %s", 
##         deparse1(substitute(x)), dQuote("Date")), domain = NA)
## }
## <bytecode: 0x7ff94f00f7e8>
## <environment: namespace:base>getS3method("as.Date", "numeric")## function (x, origin, ...) 
## {
##     if (missing(origin)) {
##         if (!length(x)) 
##             return(.Date(numeric()))
##         if (!any(is.finite(x))) 
##             return(.Date(x))
##         stop("'origin' must be supplied")
##     }
##     as.Date(origin, ...) + x
## }
## <bytecode: 0x7ff94d04e6e0>
## <environment: namespace:base>The most important S3 generic is print(): it controls how the object is printed when you type its name at the console.
print## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x7ff94ca45dc0>
## <environment: namespace:base>methods("print") %>% head(50)##  [1] "print.acf"                                   
##  [2] "print.AES"                                   
##  [3] "print.all_vars"                              
##  [4] "print.anova"                                 
##  [5] "print.ansi_string"                           
##  [6] "print.ansi_style"                            
##  [7] "print.any_vars"                              
##  [8] "print.aov"                                   
##  [9] "print.aovlist"                               
## [10] "print.ar"                                    
## [11] "print.Arima"                                 
## [12] "print.arima0"                                
## [13] "print.AsIs"                                  
## [14] "print.aspell"                                
## [15] "print.aspell_inspect_context"                
## [16] "print.bibentry"                              
## [17] "print.Bibtex"                                
## [18] "print.boxx"                                  
## [19] "print.browseVignettes"                       
## [20] "print.by"                                    
## [21] "print.bytes"                                 
## [22] "print.cache_info"                            
## [23] "print.cell_addr"                             
## [24] "print.cell_limits"                           
## [25] "print.changedFiles"                          
## [26] "print.check_code_usage_in_package"           
## [27] "print.check_compiled_code"                   
## [28] "print.check_demo_index"                      
## [29] "print.check_depdef"                          
## [30] "print.check_details"                         
## [31] "print.check_details_changes"                 
## [32] "print.check_doi_db"                          
## [33] "print.check_dotInternal"                     
## [34] "print.check_make_vars"                       
## [35] "print.check_nonAPI_calls"                    
## [36] "print.check_package_code_assign_to_globalenv"
## [37] "print.check_package_code_attach"             
## [38] "print.check_package_code_data_into_globalenv"
## [39] "print.check_package_code_startup_functions"  
## [40] "print.check_package_code_syntax"             
## [41] "print.check_package_code_unload_functions"   
## [42] "print.check_package_compact_datasets"        
## [43] "print.check_package_CRAN_incoming"           
## [44] "print.check_package_datalist"                
## [45] "print.check_package_datasets"                
## [46] "print.check_package_depends"                 
## [47] "print.check_package_description"             
## [48] "print.check_package_description_encoding"    
## [49] "print.check_package_license"                 
## [50] "print.check_packages_in_dir"Vectors with additional attributes:
Factors are built on top of integers, and have a levels attribute:
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
typeof(x)## [1] "integer"attributes(x)## $levels
## [1] "ab" "cd" "ef"
## 
## $class
## [1] "factor"Dates in R are numeric vectors that represent the number of days since 1 January 1970:
x <- as.Date("1971-01-01")
unclass(x)## [1] 365typeof(x)## [1] "double"attributes(x)## $class
## [1] "Date"Date-times are numeric vectors with class POSIXct that represent the number of seconds since 1 January 1970. (“POSIXct” stands for Portable Operating System Interface, calendar time.)
x <- lubridate::ymd_hm("1970-01-01 01:00")
unclass(x)## [1] 3600
## attr(,"tzone")
## [1] "UTC"typeof(x)## [1] "double"attributes(x)## $class
## [1] "POSIXct" "POSIXt" 
## 
## $tzone
## [1] "UTC"tzone controls how the time is printed:
attr(x, "tzone") <- "Asia/Seoul"
x## [1] "1970-01-01 10:00:00 KST"attr(x, "tzone") <- "Asia/Shanghai"
x## [1] "1970-01-01 09:00:00 CST"Tibbles are augmented lists: they have class “tbl_df” + “tbl” + “data.frame”, and names (column) and row.names attributes:
tb <- tibble::tibble(x = 1:5, y = 5:1)
typeof(tb)## [1] "list"attributes(tb)## $names
## [1] "x" "y"
## 
## $row.names
## [1] 1 2 3 4 5
## 
## $class
## [1] "tbl_df"     "tbl"        "data.frame"The difference between a tibble and a list is that all the elements of a data frame must be vectors with the same length. All functions that work with tibbles enforce this constraint.
Traditional data.frames have a very similar structure:
df <- data.frame(x = 1:5, y = 5:1)
typeof(df)## [1] "list"attributes(df)## $names
## [1] "x" "y"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3 4 5The main difference is the class. The class of tibble includes “data.frame” which means tibbles inherit the regular data frame behaviour by default.