Numerical data types

Numerical data are, by default, represented by double precision floating point values in R.

x <- 3.14
typeof(x)
## [1] "double"

y <- 3
typeof(y)
## [1] "double"

Division of a non-zero number by zero generates a double representing an infinite value:

x <- 1/0

x
## [1] Inf

typeof(x)
## [1] "double"

If you explicitly want integers, you can use the as.integer function or append L to the number:

x <- as.integer(3)
typeof(x)
## [1] "integer"

y <- 3L
typeof(y)
## [1] "integer"

R also has support for complex numbers (with real and imaginary parts). The imaginary part of a complex number represents a multiple of the square root of -1.

x <- 1 + 0i
typeof(x)
## [1] "complex"

x <- 0 + 1i
x^2
## [1] -1+0i

In-class Assignment #1

  1. What is the value of \(1/0\)? What is its type? [1 pt]

  2. What is the value of \(0/0\)? What is its type? [1 pt]

  3. What is the value of \(\sqrt{-1}\) [sqrt(-1)]? What is its type? [1 pt]

  4. What is the value of sqrt(-1 + 0i)? What is its type? Why does this differ from the previous result? [1 pt]

Character strings

Character strings are created using either single our double quotes.

first.name <- "jasmine"
typeof(first.name)
## [1] "character"

last.name <- 'smith'
typeof(last.name)
## [1] "character"

Character strings have a length, which can be found using the nchar function:

nchar(first.name)
## [1] 7

nchar(last.name)
## [1] 5

There are a number of built-in functions for manipulating character strings. Here are some of the most common ones:

paste(first.name, last.name)  # join strings
## [1] "jasmine smith"

substr(first.name, 1, 3)      # get substrings
## [1] "jas"

The stringr package

The stringr package provides a variety of useful functions for working with character strings. Install the stringr package via one of the standard mechanisms. All of the functions in the stringr package are prefixed with str_. Here are some examples:

library(stringr)

the.crisis <- "These are the times that try men's souls..."
str_length(the.crisis)  # equivalent to nchar
## [1] 43

# how many times does the character "s" appear in the string?
str_count(the.crisis, "s")
## [1] 5

# duplicate a string
str_dup("hello", 3)
## [1] "hellohellohello"

# other interesting functions
str_to_title(first.name)
## [1] "Jasmine"

str_to_upper(last.name)
## [1] "SMITH"

In-class Assignment #2

  1. Evaluate the following code. Does it give the output you expect? Why or why not? [1 pt]

    x <- "Hello, World!"
    length(x)
  2. paste and str_c both join strings, but given the same inputs they produce different outputs. Modify the call to str_c below to make the outputs identical. [1 pt]

    x <- "hello"; y <- "world"
    paste(x, y)
    ## [1] "hello world"
    str_c(x, y)
    ## [1] "helloworld"
  3. What does the function stringr::str_to_lower do? Give a code block illustrating the use of this function. [1 pt]

  4. What does the function stringr::str_split do? Give a code block illustrating the use of this function? [1 pt]

Working with Vectors in R

Vectors are the core data structure in R. Vectors store an ordered list of items all of the same type. Learning to compute effectively with vectors and one of the keys to efficient R programming. Vectors in R always have a length (accessed with the length() function) and a type (accessed with the typeof() function).

Creating vectors

The simplest way to create a vector at the interactive prompt is to use the c() function, which is short hand for “combine” or “concatenate”.

x <- c(2,4,6,8)  # a vector of numbers
length(x)
## [1] 4

typeof(x)
## [1] "double"

y <- c('joe','bob','fred') # vector of characters
length(y)
## [1] 3

typeof(y)
## [1] "character"

z <- c() # empty vector
length(z)
## [1] 0
typeof(z)
## [1] "NULL"

You can also use c() to concatenate two or more vectors together.

v <- c(1,3,5,7)
w <- c(-1, -2, -3)
x <- c(2,4,6,8)
vwx <- c(v,w,x)
vwx
##  [1]  1  3  5  7 -1 -2 -3  2  4  6  8

Regular sequences

There are a variety of functions for creating regular sequences in the form of vectors.

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

10:1
##  [1] 10  9  8  7  6  5  4  3  2  1

seq(1, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10

seq(1, 10, by = 2)
## [1] 1 3 5 7 9

seq(2, 4, by = 0.25)
## [1] 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00

Creating longer vectors using scan

For vectors of more than 10 or so elements it gets tiresome and error prone to create vectors using c(). For medium length vectors the scan() function is very useful.

> test.scores <- scan()
1: 98 92 78 65 52 59 75 77 84 31 83 72 59 69 71 66
17:
Read 16 items
> test.scores
 [1] 98 92 78 65 52 59 75 77 84 31 83 72 59 69 71 66

When you invoke scan() without any arguments the function will read in a list of values separated by white space (usually spaces or tabs). Values are read until scan encounters a blank line.

In-class Assignment #3

  1. What does the rep function do? Illustrate the use of the rep function with a code block. [1 pt]

  2. In the example above we used scan to create a numerical vector. Scan can also be used to create vector with other data types, but the default call to scan as illustrated below will not work if you type “a b c” as the inputs at the command line (try it!):

    > char.vector <- scan()
    1: a b c
    Error in scan() : scan() expected 'a real', got 'a'

How would you change the function call to scan in order to read a vector of character values so that the input shown above didn’t generate an error? [2 pts]

Vector Arithmetic

The basic R arithmetic operations work on vectors as well as on single numbers (in fact single numbers are vectors behind the scenes).

x <- c(2, 4, 6, 8, 10)
y <- c(0, 1, 3, 5, 9)

x * 2
## [1]  4  8 12 16 20

x * pi
## [1]  6.283185 12.566371 18.849556 25.132741 31.415927

x + y
## [1]  2  5  9 13 19

x * y
## [1]  0  4 18 40 90

x/y
## [1]      Inf 4.000000 2.000000 1.600000 1.111111

Useful numerical function that operate on vectors

All of the mathematical functions that we introduced earlier work with numerical vectors:

x <- c(0, pi/2, pi, 3*pi/2, 2*pi)
cos(x)
## [1]  1.000000e+00  6.123234e-17 -1.000000e+00 -1.836970e-16  1.000000e+00

y <- c(2, 4, 6, 8)
y^2
## [1]  4 16 36 64

w <- c(-1, 2, -3, 3)
abs(w)
## [1] 1 2 3 3

Here are some additional numerical functions that are useful for operating on vectors.

sum(test.scores)  # test.scores defined using scan() as above
min(test.scores)
max(test.scores)
range(test.scores) # min, max returned as a vector of length 2
sorted.scores <- sort(test.scores)
sorted.scores

Vector “recycling”

When vectors are not of the same length R “recycles” the elements of the shorter vector to make the lengths conform.

x <- c(2, 4, 6, 8, 10)
length(x)
## [1] 5

z <- c(1, 4, 7, 11)
length(z)
## [1] 4

x + z
## Warning in x + z: longer object length is not a multiple of shorter object
## length
## [1]  3  8 13 19 11

In the example above z was treated as if it was the vector (1, 4, 7, 11, 1) due to vector recycling.

Vector comparison

The comparison operators also work on vectors as shown below. Comparisons involving vectors return vectors of logical values.

x <- c(2, 4, 6, 8, 10)
x > 5
## [1] FALSE FALSE  TRUE  TRUE  TRUE

x != 4
## [1]  TRUE FALSE  TRUE  TRUE  TRUE

If you try and apply arithmetic operations to non-numeric vectors, R will warn you of the error of your ways:

w <- c('foo', 'bar', 'baz', 'qux')
w**2

Note, however that the comparison operators can work with non-numeric vectors. The results you get will depend on the type of the elements in the vector.

w <- c('foo', 'bar', 'baz', 'qux')
w == 'bar'
## [1] FALSE  TRUE FALSE FALSE

w < 'cat'
## [1] FALSE  TRUE  TRUE FALSE

Indexing Vectors

For a vector of length \(n\), we can access the elements by the indices \(1 \ldots n\). We say that R vectors (and other data structures like lists) are `one-indexed’. Many other programming languages, such as Python, C, and Java, use zero-indexing where the elements of a data structure are accessed by the indices \(0 \ldots n-1\). Indexing errors are a common source of bugs.

Indexing a vector is done by specifying the index in square brackets as shown below:

x <- c(2, 4, 6, 8, 10)
length(x)
## [1] 5

x[1]
## [1] 2

x[4]
## [1] 8

Negative indices are used to exclude particular elements. x[-1] returns all elements of x except the first.

x[-1]
## [1]  4  6  8 10

You can get multiple elements of a vector by indexing by another vector. In the example below, x[c(3,5)] returns the third and fifth element of x`.

x[c(3,5)]
## [1]  6 10

Combining Indexing and Comparison

A very powerful feature of R is the ability to combine the comparison operators with indexing. This facilitates data filtering and subsetting. Some examples:

x <- c(2, 4, 6, 8, 10)
x[x > 5]
## [1]  6  8 10
x[x < 4 | x > 6]
## [1]  2  8 10

In the first example we retrieved all the elements of x that are larger than 5 (read as “x where x is greater than 5”). In the second example we retrieved those elements of x that were smaller than four or greater than six. Combining indexing and comparison is a powerful concept which we’ll use repeatedly in this course.

Vector manipulation

You can combine indexing with assignment to change the elements of a vectors:

x <- c(2, 4, 6, 8, 10)
x[2] <- -4 
x
## [1]  2 -4  6  8 10

You can also use indexing vectors to change multiple values at once:

x <- c(2, 4, 6, 8, 10)
x[c(1, 3, 5)]  <- 6
x
## [1] 6 4 6 8 6

Using logical vectors to manipulate the elements of a vector also works:

x <- c(2, 4, 6, 8, 10)
x[x > 5] = 5    # truncate all values to have max value 5
x
## [1] 2 4 5 5 5

In-class Assignment #4

  1. What happens when you try and index past the end of a vector? Write a code block illustrating this. [1 pt]

  2. letters is a pre-defined vector of characters giving the lowercase letters of the English alphabet.

    letters
    ##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
    ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

    Write a code block that combines indexing of letters with the seq function to retrieve every 3rd letter of the alphabet. Your output should look like this [2 pts]:

    [1] "a" "d" "g" "j" "m" "p" "s" "v" "y"
  3. This is a multi-part problem:
    • Create a vector, \(x\), with all the even numbers between 2 and 100, inclusive. Do NOT generate this by hand! [1 pt]
    • Show how to use vector indexing to extract all the numbers from \(x\) that are evenly divisible by 3 (HINT: the modulo operator, %%, will be useful for this [1 pt]
    • Show how to replace all the elements of \(x\) that are greater than 20 and less than 80 with the value -99. [1 pt]