Chapter 12 Programming

For this chapter, you will not be working with a particular data set. Instead, we will be focusing on basic programming practices using vectors and functions that we create ourselves.

12.1 Functions

Functions are an extremely useful tool. As you will see in your homework, they can provide the benefit of nearly zero marginal cost with some upfront fixed costs. Our world is a repetitive place, and minimizing the amount of time you need to repeatedly do a task should be minimized. This is where functions will help us.

To motivate this, let’s consider finding the area of a circle. The formula for area of a circle is shown in equation (1).

\[\begin{equation} Area = \pi \times r^2 \end{equation}\]

Suppose you wanted to find the area of two circles with radii of 3 and 5 respectively. Although a simple task, it would be painfully slow to type the numbers over and over into R and return an answer. Instead, let’s create a function that will compute this area for us.

Functions in R come with the following syntax:

## function_name is the function's name that we create
## function() tells R that we are creating a function
## x is an input for the function
function_name <- function(x){
  ## body to add. 
  return()
}

There are a few things to note here:

  • function_name is the name we have given our function.
  • function tells R that we are creating a function.
  • x is an input to the function
  • return specifies what value should return (i.e. output).

Recall that functions are like a machine: they require an input, and using that input, create an output. In this example, our input is x, but we have not specified an output.

To provide a concrete example, we will now create a function called area_of_circle that will do the following:

  1. Take the radius of the circle as an input.
  2. Calculate the area of the circle.
  3. Return the area of the circle as an output.

Observe:

## This function returns the area of a circle
## This function takes 1 input: the radius of the circle
area_of_circle <- function(radius){
  area <- radius^2 * pi
  return(area)
}

Notice how the function was constructed. First, we write comments that explain what the function does, and what its inputs are. This is a best practice so you can understand what your function does in the future. Next, we write the actual code of the function.

Now that we have created the function, we can input any value for the radius we like and the function will return the area of the circle.

## Calculating area of circle with radius 3
area_of_circle(radius = 3)
## [1] 28.27433
## Calculating area of circle with radius 5
area_of_circle(radius = 5)
## [1] 78.53982

We could also modify the return statement so that it gives us a little more information:

## This function takes 1 input: the radius of the circle
## It returns the area of the circle
area_of_circle <- function(radius){
  area <- radius^2 * pi
  return(print(paste("The area of the circle with radius", radius, "is", area)))
}
## Calculating area of circle with radius 3
area_of_circle(radius = 3)
## [1] "The area of the circle with radius 3 is 28.2743338823081"
## Calculating area of circle with radius 5
area_of_circle(radius = 5)
## [1] "The area of the circle with radius 5 is 78.5398163397448"

12.1.1 Exercise

  • Create a function called miles2kilo_feet2meter which takes two inputs, miles and feet. The function then converts the miles input to kilometers, and the feet input to meters. The function then prints out the results, specifying each.

12.2 For-loops

For-loops are a common type of loop that, like functions, take care of repetitive tasks. For-loops work by iterating through a specified task for a certain amount of times. Once that specified amount of times has been met, the for-loop stops.

The general syntax for for-loops is as follows:

## for-loop syntax
for (i in {some specified range}){
  ##do something
}

One thing to note about for loops is the indexing variable i. This variable i will be set equal to your indexed value at each stage in the loop. We will illustrate this by example:

## a for-loop that loops through values 1 to 5 and prints the index at each stage
for (i in 1:5){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

The for-loop here is telling R to print the value of i at each iteration. The for-loop lasts for 5 iterations as we specified. Notice that at each iteration, the i variable is updated to match the index it is looping through. For instance, i is assigned to 1 the first time through the loop, 2 the second time through the loop, 3 the third time etc. However, numbers are not the only thing we can loop through. We can actually loop through all of the elements in a vector.

## a vector for demonstration
favorite_econ = c("Econ 140A", "Econ 241", "Econ 290", "Econ 145")

## looping through each element of the favorite_econ vector
for (i in favorite_econ){
  print(i)
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"

Here, i is still our index, but instead, we are telling the loop to go through each element in the vector one-by-one. Hence, on the first time through the loop, the index i is assigned to “Econ 140A”, while i is assigned to “Econ 241” the second time through etc.

Now let’s try a loop that is most frequently used: looping through a entire vector’s indices.

## looping through a vectors indices 
for (i in 1:length(favorite_econ)){
  print(favorite_econ[i])
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"

There is quite a lot to unpack here:

  • The for-loop is assigning the index i variable to the number 1 the first time through, 2 the second time through, and continuing on until it reaches the the number 4 which is equal to the length of the favorite_econ vector.
  • At each iteration of the loop, we print the value of favorite_econ at each index. For instance, i is equal to 1 the first time through the loop, so the first thing that will be printed is favorite_econ[1] which is equivalent to “Econ 140A” (check yourself!).

Another thing to note is that you do not need to have i as the name of the index. In fact, when you combine multiple for-loops together, it can be helpful to change your index. Observe:

## econ_class is the index in this case
for (econ_class in favorite_econ){
  print(econ_class)
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"

As a final example, we will modify our area_of_circle function. The function will take a range of numbers as an input, and collect the area of the circle using each of these numbers as a radii. In particular, our function will take two inputs: minradius and maxradius. The minradius argument will be the smallest radius we would like to calculate the area of the circle with, while the maxradius argument will be the largest. The function will calculate the area of the circle for every integer between minradius and maxradius and return a vector with all of the areas.

## modifying the function to collect the area of circles with radii of integers 1 through 10
area_of_circle <- function(minradius, maxradius){
  area_vector <- rep(NA, 10) #creating an empty vector of 10 NAs
  for (i in minradius:maxradius){
    area_vector[i] = pi * i^2  #calculating the area 
  }
  return(area_vector) #returning the vector of areas
}
area_of_circle(minradius = 1, maxradius = 10)
##  [1]   3.141593  12.566371  28.274334  50.265482  78.539816 113.097336 153.938040
##  [8] 201.061930 254.469005 314.159265

As an aside, it is important to note that we can also loop through columns in a data frame. This is a task you will likely need to do at some point in your data wrangling career, and there isn’t much great help for this online.

library(tidyverse)
## loading in the mtcars data set from R
cars <- mtcars

## selecting only three columns for demonstration
cars <- cars %>% select(mpg, hp, cyl)

## looping through the columns and printing them out
for (i in names(cars)){
  print(cars[i])
}
##                      mpg
## Mazda RX4           21.0
## Mazda RX4 Wag       21.0
## Datsun 710          22.8
## Hornet 4 Drive      21.4
## Hornet Sportabout   18.7
## Valiant             18.1
## Duster 360          14.3
## Merc 240D           24.4
## Merc 230            22.8
## Merc 280            19.2
## Merc 280C           17.8
## Merc 450SE          16.4
## Merc 450SL          17.3
## Merc 450SLC         15.2
## Cadillac Fleetwood  10.4
## Lincoln Continental 10.4
## Chrysler Imperial   14.7
## Fiat 128            32.4
## Honda Civic         30.4
## Toyota Corolla      33.9
## Toyota Corona       21.5
## Dodge Challenger    15.5
## AMC Javelin         15.2
## Camaro Z28          13.3
## Pontiac Firebird    19.2
## Fiat X1-9           27.3
## Porsche 914-2       26.0
## Lotus Europa        30.4
## Ford Pantera L      15.8
## Ferrari Dino        19.7
## Maserati Bora       15.0
## Volvo 142E          21.4
##                      hp
## Mazda RX4           110
## Mazda RX4 Wag       110
## Datsun 710           93
## Hornet 4 Drive      110
## Hornet Sportabout   175
## Valiant             105
## Duster 360          245
## Merc 240D            62
## Merc 230             95
## Merc 280            123
## Merc 280C           123
## Merc 450SE          180
## Merc 450SL          180
## Merc 450SLC         180
## Cadillac Fleetwood  205
## Lincoln Continental 215
## Chrysler Imperial   230
## Fiat 128             66
## Honda Civic          52
## Toyota Corolla       65
## Toyota Corona        97
## Dodge Challenger    150
## AMC Javelin         150
## Camaro Z28          245
## Pontiac Firebird    175
## Fiat X1-9            66
## Porsche 914-2        91
## Lotus Europa        113
## Ford Pantera L      264
## Ferrari Dino        175
## Maserati Bora       335
## Volvo 142E          109
##                     cyl
## Mazda RX4             6
## Mazda RX4 Wag         6
## Datsun 710            4
## Hornet 4 Drive        6
## Hornet Sportabout     8
## Valiant               6
## Duster 360            8
## Merc 240D             4
## Merc 230              4
## Merc 280              6
## Merc 280C             6
## Merc 450SE            8
## Merc 450SL            8
## Merc 450SLC           8
## Cadillac Fleetwood    8
## Lincoln Continental   8
## Chrysler Imperial     8
## Fiat 128              4
## Honda Civic           4
## Toyota Corolla        4
## Toyota Corona         4
## Dodge Challenger      8
## AMC Javelin           8
## Camaro Z28            8
## Pontiac Firebird      8
## Fiat X1-9             4
## Porsche 914-2         4
## Lotus Europa          4
## Ford Pantera L        8
## Ferrari Dino          6
## Maserati Bora         8
## Volvo 142E            4

12.2.1 Exercise

  • Write a function called sum_func that calculates the sum of all numbers within a specified range. The function will take two inputs: minval which is the smallest number and maxval which is the largest number.

12.3 If-else

Conditional statements are the heart of programming. The if-else statement evaluates whether a condition is TRUE or FALSE and then perform a computation based on the truth of the statement. The syntax of an if-else statement is as follows:

if (logical statement){
  ##perform an action if the logical statement is TRUE 
}
else{
  ## perform a different action if NOT TRUE
}

For example, suppose we wanted to loop a vector of random numbers in the range of 1 to 100 and find out how many of them are greater than 50. We could do this in the following way:

## setting the seed for replication
set.seed(1992)

## creating a random sample of 100 integers in the interval 1 to 100
numbers <- sample(1:100, 100)

## counting how many numbers come out bigger than 50
count <- 0 ## setting our counter to 0
for (i in numbers){
  ## evaluating if the number is greater than 50
  if (i > 50) {
    count <- count + 1 ## adding a 1 to our counter
  }
  else{
    count <- count ## redundant, but here for example
  }
}

Note the set.seed function. This is a function that allows replication of results when using random sampling techniques. Essentially it ensures that your random sample is the same random sample every time you run the program. While the seed was set to 1992, you can set the seed to any number you like. Of course, each seed number has its own unique sample (e.g. set.seed(1) will give different results than set.seed(2)).

12.3.1 Exercise

  • Loop through your numbers vector and assign grade values to each of the numbers, and place the grades in a separate vector. For the grade values, if the number is less than 60, assign an “F”, if the number is between 60 and 69 assign a “D”, if the number is between 70 and 79 assign a “C”, if the number is between 80 and 89 assign a “B”, and if the number is between 90 and 100 assign an “A”. For fun, graph a histogram of the distribution.

12.4 Selected Solutions

## a solution to Exercise 1.1.1

## this function converts miles to kilometers and feet to meters
## it takes two input paramters: miles and feet
## it returns two values: the kilometers and the meters
miles2kilo_feet2meter <- function(miles, feet){
  kilometers <- miles * 1.6
  meters <- feet * 0.3
  return(print(paste("Kilometers: ", kilometers, "Meters: ", meters)))
}
## a solution to Exercise 1.2.1

## This function calculates the sum of all numbers within a specified range
## two parameters: minval and maxvalue
## minval: the minimum value of the range
## maxval: the maximum value of the range
sum_func <- function(minval, maxval) {
  total = 0
  for (i in minval:maxval) {
    total = total + i
  }
  return(print(paste("The sum of all numbers is ", total)))
}