Numerical data are, by default, represented by double precision floating point values in R.
x <- 3.14
typeof(x)
## [1] "double"
y <- 3
typeof(y)
## [1] "double"
Division of a non-zero number by zero generates a double representing an infinite value:
x <- 1/0
x
## [1] Inf
typeof(x)
## [1] "double"
If you explicitly want integers, you can use the as.integer
function or append L
to the number:
x <- as.integer(3)
typeof(x)
## [1] "integer"
y <- 3L
typeof(y)
## [1] "integer"
R also has support for complex numbers (with real and imaginary parts). The imaginary part of a complex number represents a multiple of the square root of -1.
x <- 1 + 0i
typeof(x)
## [1] "complex"
x <- 0 + 1i
x^2
## [1] -1+0i
What is the value of \(1/0\)? What is its type? [1 pt]
What is the value of \(0/0\)? What is its type? [1 pt]
What is the value of \(\sqrt{-1}\) [sqrt(-1)
]? What is its type? [1 pt]
What is the value of sqrt(-1 + 0i)
? What is its type? Why does this differ from the previous result? [1 pt]
Character strings are created using either single our double quotes.
first.name <- "jasmine"
typeof(first.name)
## [1] "character"
last.name <- 'smith'
typeof(last.name)
## [1] "character"
Character strings have a length, which can be found using the nchar
function:
nchar(first.name)
## [1] 7
nchar(last.name)
## [1] 5
There are a number of built-in functions for manipulating character strings. Here are some of the most common ones:
paste(first.name, last.name) # join strings
## [1] "jasmine smith"
substr(first.name, 1, 3) # get substrings
## [1] "jas"
stringr
packageThe stringr
package provides a variety of useful functions for working with character strings. Install the stringr
package via one of the standard mechanisms. All of the functions in the stringr
package are prefixed with str_
. Here are some examples:
library(stringr)
the.crisis <- "These are the times that try men's souls..."
str_length(the.crisis) # equivalent to nchar
## [1] 43
# how many times does the character "s" appear in the string?
str_count(the.crisis, "s")
## [1] 5
# duplicate a string
str_dup("hello", 3)
## [1] "hellohellohello"
# other interesting functions
str_to_title(first.name)
## [1] "Jasmine"
str_to_upper(last.name)
## [1] "SMITH"
Evaluate the following code. Does it give the output you expect? Why or why not? [1 pt]
x <- "Hello, World!"
length(x)
paste
and str_c
both join strings, but given the same inputs they produce different outputs. Modify the call to str_c
below to make the outputs identical. [1 pt]
x <- "hello"; y <- "world"
paste(x, y)
## [1] "hello world"
str_c(x, y)
## [1] "helloworld"
What does the function stringr::str_to_lower
do? Give a code block illustrating the use of this function. [1 pt]
What does the function stringr::str_split
do? Give a code block illustrating the use of this function? [1 pt]
Vectors are the core data structure in R. Vectors store an ordered list of items all of the same type. Learning to compute effectively with vectors and one of the keys to efficient R programming. Vectors in R always have a length (accessed with the length()
function) and a type (accessed with the typeof()
function).
The simplest way to create a vector at the interactive prompt is to use the c()
function, which is short hand for “combine” or “concatenate”.
x <- c(2,4,6,8) # a vector of numbers
length(x)
## [1] 4
typeof(x)
## [1] "double"
y <- c('joe','bob','fred') # vector of characters
length(y)
## [1] 3
typeof(y)
## [1] "character"
z <- c() # empty vector
length(z)
## [1] 0
typeof(z)
## [1] "NULL"
You can also use c()
to concatenate two or more vectors together.
v <- c(1,3,5,7)
w <- c(-1, -2, -3)
x <- c(2,4,6,8)
vwx <- c(v,w,x)
vwx
## [1] 1 3 5 7 -1 -2 -3 2 4 6 8
There are a variety of functions for creating regular sequences in the form of vectors.
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
10:1
## [1] 10 9 8 7 6 5 4 3 2 1
seq(1, 10)
## [1] 1 2 3 4 5 6 7 8 9 10
seq(1, 10, by = 2)
## [1] 1 3 5 7 9
seq(2, 4, by = 0.25)
## [1] 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
scan
For vectors of more than 10 or so elements it gets tiresome and error prone to create vectors using c()
. For medium length vectors the scan()
function is very useful.
> test.scores <- scan()
1: 98 92 78 65 52 59 75 77 84 31 83 72 59 69 71 66
17:
Read 16 items
> test.scores
[1] 98 92 78 65 52 59 75 77 84 31 83 72 59 69 71 66
When you invoke scan()
without any arguments the function will read in a list of values separated by white space (usually spaces or tabs). Values are read until scan
encounters a blank line.
What does the rep
function do? Illustrate the use of the rep
function with a code block. [1 pt]
In the example above we used scan
to create a numerical vector. Scan can also be used to create vector with other data types, but the default call to scan as illustrated below will not work if you type “a b c” as the inputs at the command line (try it!):
> char.vector <- scan()
1: a b c
Error in scan() : scan() expected 'a real', got 'a'
How would you change the function call to scan
in order to read a vector of character values so that the input shown above didn’t generate an error? [2 pts]
The basic R arithmetic operations work on vectors as well as on single numbers (in fact single numbers are vectors behind the scenes).
x <- c(2, 4, 6, 8, 10)
y <- c(0, 1, 3, 5, 9)
x * 2
## [1] 4 8 12 16 20
x * pi
## [1] 6.283185 12.566371 18.849556 25.132741 31.415927
x + y
## [1] 2 5 9 13 19
x * y
## [1] 0 4 18 40 90
x/y
## [1] Inf 4.000000 2.000000 1.600000 1.111111
All of the mathematical functions that we introduced earlier work with numerical vectors:
x <- c(0, pi/2, pi, 3*pi/2, 2*pi)
cos(x)
## [1] 1.000000e+00 6.123234e-17 -1.000000e+00 -1.836970e-16 1.000000e+00
y <- c(2, 4, 6, 8)
y^2
## [1] 4 16 36 64
w <- c(-1, 2, -3, 3)
abs(w)
## [1] 1 2 3 3
Here are some additional numerical functions that are useful for operating on vectors.
sum(test.scores) # test.scores defined using scan() as above
min(test.scores)
max(test.scores)
range(test.scores) # min, max returned as a vector of length 2
sorted.scores <- sort(test.scores)
sorted.scores
When vectors are not of the same length R “recycles” the elements of the shorter vector to make the lengths conform.
x <- c(2, 4, 6, 8, 10)
length(x)
## [1] 5
z <- c(1, 4, 7, 11)
length(z)
## [1] 4
x + z
## Warning in x + z: longer object length is not a multiple of shorter object
## length
## [1] 3 8 13 19 11
In the example above z
was treated as if it was the vector (1, 4, 7, 11, 1)
due to vector recycling.
The comparison operators also work on vectors as shown below. Comparisons involving vectors return vectors of logical values.
x <- c(2, 4, 6, 8, 10)
x > 5
## [1] FALSE FALSE TRUE TRUE TRUE
x != 4
## [1] TRUE FALSE TRUE TRUE TRUE
If you try and apply arithmetic operations to non-numeric vectors, R will warn you of the error of your ways:
w <- c('foo', 'bar', 'baz', 'qux')
w**2
Note, however that the comparison operators can work with non-numeric vectors. The results you get will depend on the type of the elements in the vector.
w <- c('foo', 'bar', 'baz', 'qux')
w == 'bar'
## [1] FALSE TRUE FALSE FALSE
w < 'cat'
## [1] FALSE TRUE TRUE FALSE
For a vector of length \(n\), we can access the elements by the indices \(1 \ldots n\). We say that R vectors (and other data structures like lists) are `one-indexed’. Many other programming languages, such as Python, C, and Java, use zero-indexing where the elements of a data structure are accessed by the indices \(0 \ldots n-1\). Indexing errors are a common source of bugs.
Indexing a vector is done by specifying the index in square brackets as shown below:
x <- c(2, 4, 6, 8, 10)
length(x)
## [1] 5
x[1]
## [1] 2
x[4]
## [1] 8
Negative indices are used to exclude particular elements. x[-1]
returns all elements of x
except the first.
x[-1]
## [1] 4 6 8 10
You can get multiple elements of a vector by indexing by another vector. In the example below, x[c(3,5)]
returns the third and fifth element of x`.
x[c(3,5)]
## [1] 6 10
A very powerful feature of R is the ability to combine the comparison operators with indexing. This facilitates data filtering and subsetting. Some examples:
x <- c(2, 4, 6, 8, 10)
x[x > 5]
## [1] 6 8 10
x[x < 4 | x > 6]
## [1] 2 8 10
In the first example we retrieved all the elements of x
that are larger than 5 (read as “x where x is greater than 5”). In the second example we retrieved those elements of x
that were smaller than four or greater than six. Combining indexing and comparison is a powerful concept which we’ll use repeatedly in this course.
You can combine indexing with assignment to change the elements of a vectors:
x <- c(2, 4, 6, 8, 10)
x[2] <- -4
x
## [1] 2 -4 6 8 10
You can also use indexing vectors to change multiple values at once:
x <- c(2, 4, 6, 8, 10)
x[c(1, 3, 5)] <- 6
x
## [1] 6 4 6 8 6
Using logical vectors to manipulate the elements of a vector also works:
x <- c(2, 4, 6, 8, 10)
x[x > 5] = 5 # truncate all values to have max value 5
x
## [1] 2 4 5 5 5
What happens when you try and index past the end of a vector? Write a code block illustrating this. [1 pt]
letters
is a pre-defined vector of characters giving the lowercase letters of the English alphabet.
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
Write a code block that combines indexing of letters
with the seq
function to retrieve every 3rd letter of the alphabet. Your output should look like this [2 pts]:
[1] "a" "d" "g" "j" "m" "p" "s" "v" "y"
%%
, will be useful for this [1 pt]