Libraries

library(tidyverse)
library(stringr)

Data

We’ll use the NC births data set we’ve explored in a previous class session for some of our examples.

# load the births data set
births <- read_tsv("https://raw.githubusercontent.com/Bio204-class/bio204-datasets/master/births.txt")

Introduction

Control flow statements control the order of execution of different pieces of code. They can be used to do things like make sure code is only run when certain conditions are met, to iterate through data structures, to repeat something until a specified event happens, etc. Control flow statements are frequently used when writing functions or carrying out complex data transformation.

if and if-else statements

if and if-else blocks allow you to structure the flow of execution so that certain expressions are executed only if particular conditions are met.

The general form of an if expression is:

if (Boolean expression) {
  Code to execute if 
  Boolean expression is true
}

Here’s a simple if expression in which we check whether a number is less than 0.5, and if so assign a values to a variable.

x <- runif(1)  # runif generates a random number between 0 and 1
face <- NULL  # set face to a NULL value

if (x < 0.5) {
  face <- "heads"
}
face

The else clause specifies what to do in the event that the if statement is not true. The combined general for of an if-else expression is:

if (Boolean expression) {
  Code to execute if 
  Boolean expression is true
} else {
  Code to execute if 
  Boolean expression is false
}

Our previous example makes more sense if we include an else clause.

x <- runif(1)

if (x < 0.5) {
  face <- "heads"
} else {
  face <- "tails"
}

face

With the addition of the else statement, this simple code block can be thought of as simulating the toss of a coin.

if-else in a function

Let’s take our “if-else” example above and turn it into a function we’ll call coin.flip. A literal re-interpretation of our previous code in the context of a function is something like this:

# coin.flip.literal takes no arguments
coin.flip.literal <- function() {
  x <- runif(1)
  if (x < 0.5) {
    face <- "heads"
  } else {
    face <- "tails"
  }
  face
}

coin.flip.literal is pretty long for what it does — we created a temporary variable x that is only used once, and we created the variable face to hold the results of our if-else statement, but then immediately returned the result. This is inefficient and decreases readability of our function. A much more compact implementation of this function is as follows:

coin.flip <- function() {
  if (runif(1) < 0.5) {
    return("heads")
  } else {
    return("tails")
  }
}

Note that in our new version of coin.flip we don’t bother to create temporary the variables x and face and we immediately return the results within the if-else statement.

Multiple if-else statements

When there are more than two possible outcomes of interest, multiple if-else statements can be chained together. Here is an example with three outcomes:

x <- sample(-5:5, 1)  # sample a random integer between -5 and 5

if (x < 0) {
  sign.x <- "Negative"
} else if (x > 0) {
  sign.x <- "Positive"
} else {
  sign.x <- "Zero"
}

sign.x

In-class Assignment #1

  1. Write a function called even.or.odd that takes as its input a numeric value, x, and returns the character string “odd” if x is an odd-valued and “even” if x is even-valued. For example, even.or.odd(99) should return "odd", while even.or.odd(42) should return "even". Include two examples of usage to test your function [2 pts]
even.or.odd <- function(x) {
# body of your function here
}

# Write some appropriate tests of your function here

for loops

A for statement iterates over the elements of a sequence (such as vectors or lists). A common use of for statements is to carry out a calculation on each element of a sequence (but see the discussion of map below) or to make a calculation that involves all the elements of a sequence.

The general form of a for loop is:

for (elem in sequence) {
  Do some calculations or
  Evaluate one or more expressions
}

As an example, say we wanted to call our coin.flip function multiple times. We could use a for loop to do so as follows:

flips <- c() # empty vector to hold outcomes of coin flips
for (i in 1:20) {
  flips <- c(flips, coin.flip())  # flip coin and add to our vector
}
flips

Let’s use a for loop to create a multi.coin.flip function thats accepts an optional argument n that specifies the number of coin flips to carry out:

multi.coin.flip <- function(n = 1) {
  flips <- c()  # create an empty vector
  for (i in 1:n) {
    flips <- c(flips, coin.flip())
  }
  flips
}

With this new definition, a single call of coin.flip returns a single outcome:

multi.coin.flip()

And calling multi.coin.flip with a numeric argument returns multiple coin flips:

multi.coin.flip(n=10)

In-class Assignment #2

  1. Rewrite your previous even.or.odd function using a for loop, so that it accepts a vector of numeric inputs and returns a character vector specifying “even” or “odd” for each corresponding element in the input vector. Include two examples of usage to test your function [2 pts]
even.or.odd.multi <- function(x) {
# body of your function here
}

# Write some appropriate tests of your function here

break statement

A break statement allows you to exit a loop even if it hasn’t completed. This is useful for ending a control statement when some criteria has been satisfied. break statements are usually nested in if statements.

In the following example we use a break statement inside a for loop. In this example, we pick random real numbers between 0 and 1, accumulating them in a vector (random.numbers). The for loop insures that we never pick more than 20 random numbers before the loop ends. However, the break statement allows the loop to end prematurely if the number picked is greater than 0.95.

random.numbers <- c()

for (i in 1:20) {
  x <- runif(1)
  random.numbers <- c(random.numbers, x)
  if (x > 0.95) {
    break
  }
}

random.numbers

repeat loops

A repeat loop will loop indefinitely until we explicitly break out of the loop with a break statement. For example, here’s an example of how we can use repeat and break to simulate flipping coins until we get a head:

ct <- 0
repeat {
  flip <- coin.flip()
  ct <- ct + 1
  if (flip == "heads"){
    break
  }
}

ct

next statement

A next satement allows you to halt the processing of the current iteration of a loop and immediately move to the next item of the loop. This is useful when you want to skip calculations for certain elements of a sequence:

sum.not.div3 <- 0

for (i in 1:20) {
  if (i %% 3 == 0) { # skip summing values that are evenly divisible by three
    next
  }
  sum.not.div3 <- sum.not.div3 + i
}
sum.not.div3

while statements

A while statement iterates as long as the condition statement it contains is true. In the following example, the while loop calls coin.flip until “heads” is the result, and keeps track of the number of flips. Note that this represents the same logic as the repeat-break example we saw earlier, but in a a more compact form.

first.head <- 1

while(coin.flip() == "tails"){
  first.head <- first.head + 1
}

first.head

ifelse

The ifelse function is equivalent to a for-loop with a nested if-else statement. ifelse applies the specified test to each element of a vector, and returns different values depending on if the test is true or false.

Here’s an example of using ifelse to replace NA elements in a vector with zeros.

x <- c(3, 1, 4, 5, 9, NA, 2, 6, 5, 4)
newx <- ifelse(is.na(x), 0, x)
newx

The equivalent for-loop could be written as:

x <- c(3, 1, 4, 5, 9, NA, 2, 6, 5, 4)
newx <- c()  # create an empty vector
for (elem in x) {
  if (is.na(elem)) {
    newx <- c(newx, 0)  # append zero to newx
  } else {
    newx <- c(newx, elem)  # append elem to newx
  }
}
newx

The ifelse function is clearly a more compact and readable way to accomplish this.

In-class Assignment #3

  1. Write a function called is.even that given a vector of numeric input, x, returns a vector of logical values, where each element is TRUE for each element of x that is even or FALSE otherwise. Include two examples of usage to test your function [1 pts]

    is.even <- function(x) {
     # body of your function here 
    }
    
    # Write some appropriate tests of your function here
  2. Write a code block illustrating how to use your is.even function together with ifelse to test each of the integers from 1 to 10 for their parity, returning the character string “even” or “odd” as appropriate for each element of the vector one.to.ten [2 pts]

    one.to.ten <- 1:10
    
    # uncomment and complete the partially filled in ifelse expression below
    # ifelse(..., ..., "odd") 
    
    # the output should be:
    # [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

map and friends

Another common case we find ourselves faced with is applying a function to every element of a list or vector. Again, we could use a for loop, but the map function which we introduced several lectures ago is often a better alternative.

NOTE: map is a relative newcomer to R and must be loaded with the purrr package (purrr is loaded when we load tidyverse). There is a menagerie of base “apply” functions (apply, lapply, sapply, vapply, mapply) that have approximately similar behavior to map but the semantics and specific use cases of the “apply” functions is often tricky. map is an attempt to simplify this functionality into a consistent and common interface. We won’t use the “apply” functions in this class, but if reading someone elses R code it’s important to be aware that these functions are similar to map.

basic map

map takes two basic arguments – a sequence (a vector, list, or data frame) and a function. It then applies the function to each element of the sequence, returning the results in the form of a list.

To illustrate map, let’s consider an example in which we have a list of 2-vectors, where each vector gives the min and max values of some variable of interest for individuals in a sample (e.g. resting heart rate and maximum heart rate during exercise). We could use the map function to quickly generate the difference between the resting and maximum heart rates like so:

heart.rates <- list(bob = c(60, 120), fred = c(79, 150), jim = c(66, 110))
diff.fxn <- function(x) {x[2] - x[1]}
map(heart.rates, diff.fxn)

As a second example, here’s how we could use map to get the class of each object in a list:

x <- list(c(1,2,3), "a", "b", "c", list(lead = "Michael", keyboard = "Jermaine"))
map(x, class)

In-class Assignment #4

  1. Write a code block illustrating how to use the map function and the str_to_title function (defined in the package stringr) to properly capitalize a vector of names [1 pt]
last.and.first <- c("john smith", "mary hernandez", "fred kidogo")

# your map code below

# Output should be:
# [[1]]
# [1] "John Smith"
# 
# [[2]]
# [1] "Mary Hernandez"
# 
# [[3]]
# [1] "Fred Kidogo"

map_if and map_at

map_if is a variant of map that takes a predicate functions (a function that evaluates to TRUE or FALSE) to determine which elements of the input sequence are transformed by the map function. All elements of the sequence that do not meet the predicate are left un-transformed. Like map, map_if always returns a list.

Here’s an example of using map_if to round numbers that are less one (numbers greater than one are not transformed):

x <- c(1.5, 2, 0.5, 0.1, 4, 0)
map_if(x, function(x){x < 1}, round)

Here’s another example where we use map_if to apply the stringr::str_to_upper function to those columns of a data frame that are character vectors:

births.upper <- map_if(births, is.character, str_to_upper)

# compare the same variable in original and transformed data frame to see the difference
births$premature
births.upper$premature

map_at is similar to map_if, but applies the transformation function only at specified elements (specified by index or name). For example, if we wanted to apply the str_to_upper function to just a couple of specific columns we could do the following:

# map str_to_upper only onto sexBaby and smoke variables
births.upper2 <- map_at(births, c("sexBaby", "smoke"), str_to_upper)

mapping in parallel using map2

The map2 function applies a transformation function to two sequences in parallel. The following example illustrates this:

first.names <- c("John", "Mary", "Fred")
last.names <- c("Smith", "Hernandez", "Kidogo")
map2(first.names, last.names, str_c, sep=" ")

Note how we can specify arguments to the transformation function as additional arguments to map2 (i.e. the sep argument gets passed to str_c)

map variants that return vectors

map, map_if, and map_at always return lists. The purrr library also has a series of map variants that return vectors. These are map_lgl (for logical vectors), map_chr (for character vectors), map_int (integer vectors), map_dbl (doulbe vectors)

# compare the outputs of map and map_chr
map_chr(letters, str_to_upper)
map(letters, str_to_upper)

Here’s an example using map_dbl where we pick ten random numbers between 1 and 100 and round each of them to the nearest whole multiple of ten.

round.tens <- function(x) {
  # round a number to the nearest tens place
  if (x %% 10 < 5) {
    x - (x %% 10)
  } else {
    x + 10 - (x %% 10)
  }
}

map_dbl(sample(1:100, 10), round.tens)

map_dfc and map_dfr

map_dfc and map_dfr apply the map function to the elements of a sequence, collecting the results into a data frame. The results can be collected together either as columns (map_dfc) or rows (map_dfr) of the data frame.

To illustrate map_dfc and map_dfr, let’s write a function that returns the mean and median of it’s input in the form of a list:

mean.and.median <- function(x){
  list(mean = mean(x, na.rm = TRUE), 
       median = median(x, na.rm = TRUE))
}

map_dfr

Let first use map_dfr to accumulate the means and medians of the “gained” and “weight” columns into a table, with the info for each variable in rows:

births %>%
  select(gained, weight) %>%
  map_dfr(mean.and.median)

The first row in the output corresponds to the gained variable, the second row corresponds to the weight variable. This isn’t a great table, because it doesn’t include the variable names. Let’s see how we can improve on this:

better.table <- 
  births %>%
  select(gained, weight) %>%
  map_dfr(mean.and.median) %>%
  mutate(variable.name = c("gained", "weight")) %>%  # add a column for variable names
  select(variable.name, everything())  # reorder so variable names column is first
  
better.table

map_dfc

Here is a similar example but using map_dfc, where the transformed data is accumulated as columns:

births %>%
  select(gained, weight) %>%
  map_dfc(mean.and.median)

Again, we might want to improve on our example by giving the columns meaningful names:

better.table.columns <-
  births %>%
  select(gained, weight) %>%
  map_dfc(mean.and.median) 

part1 <- c("mean", "median", "mean", "median")
part2 <- c("gained", "gained", "weight", "weight")
names(better.table.columns) <- map2(part1, part2, str_c, sep=".")

better.table.columns

In-class Assignment #5

  1. Write a function that takes a numeric vector as input, and returns a list with the min, max, and range(= max - min), of the vector as elements of a list. Make sure your function deals with NA values. Test your function with two examples. Make sure the list you return has named elements (e.g. list(min = ..., max = ..., range = ...)) [2 pts]

    min.max.range <- function(x) {
    # body of your function here
    }
    
    # examples to test your function here
  2. Write a code block illustrating how to use the map_dfr function and your min.max.range function to create a table giving the min, max, and range of the weeks and visits variables from the NC births data set [2 pts]

    # Code block here using `map_dfr`