Chapter 12 Programming
For this chapter, you will not be working with a particular data set. Instead, we will be focusing on basic programming practices using vectors and functions that we create ourselves.
12.1 Functions
Functions are an extremely useful tool. As you will see in your homework, they can provide the benefit of nearly zero marginal cost with some upfront fixed costs. Our world is a repetitive place, and minimizing the amount of time you need to repeatedly do a task should be minimized. This is where functions will help us.
To motivate this, let’s consider finding the area of a circle. The formula for area of a circle is shown in equation (1).
\[\begin{equation} Area = \pi \times r^2 \end{equation}\]
Suppose you wanted to find the area of two circles with radii of 3 and 5 respectively. Although a simple task, it would be painfully slow to type the numbers over and over into R and return an answer. Instead, let’s create a function that will compute this area for us.
Functions in R come with the following syntax:
## function_name is the function's name that we create
## function() tells R that we are creating a function
## x is an input for the function
<- function(x){
function_name ## body to add.
return()
}
There are a few things to note here:
function_name
is the name we have given our function.function
tells R that we are creating a function.x
is an input to the functionreturn
specifies what value should return (i.e. output).
Recall that functions are like a machine: they require an input, and using that input, create an output. In this example, our input is x, but we have not specified an output.
To provide a concrete example, we will now create a function called area_of_circle
that will do the following:
- Take the radius of the circle as an input.
- Calculate the area of the circle.
- Return the area of the circle as an output.
Observe:
## This function returns the area of a circle
## This function takes 1 input: the radius of the circle
<- function(radius){
area_of_circle <- radius^2 * pi
area return(area)
}
Notice how the function was constructed. First, we write comments that explain what the function does, and what its inputs are. This is a best practice so you can understand what your function does in the future. Next, we write the actual code of the function.
Now that we have created the function, we can input any value for the radius we like and the function will return the area of the circle.
## Calculating area of circle with radius 3
area_of_circle(radius = 3)
## [1] 28.27433
## Calculating area of circle with radius 5
area_of_circle(radius = 5)
## [1] 78.53982
We could also modify the return
statement so that it gives us a little more information:
## This function takes 1 input: the radius of the circle
## It returns the area of the circle
<- function(radius){
area_of_circle <- radius^2 * pi
area return(print(paste("The area of the circle with radius", radius, "is", area)))
}
## Calculating area of circle with radius 3
area_of_circle(radius = 3)
## [1] "The area of the circle with radius 3 is 28.2743338823081"
## Calculating area of circle with radius 5
area_of_circle(radius = 5)
## [1] "The area of the circle with radius 5 is 78.5398163397448"
12.2 For-loops
For-loops are a common type of loop that, like functions, take care of repetitive tasks. For-loops work by iterating through a specified task for a certain amount of times. Once that specified amount of times has been met, the for-loop stops.
The general syntax for for-loops is as follows:
## for-loop syntax
for (i in {some specified range}){
##do something
}
One thing to note about for loops is the indexing variable i
. This variable i
will be set equal to your indexed value at each stage in the loop. We will illustrate this by example:
## a for-loop that loops through values 1 to 5 and prints the index at each stage
for (i in 1:5){
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
The for-loop here is telling R to print the value of i
at each iteration. The for-loop lasts for 5 iterations as we specified. Notice that at each iteration, the i
variable is updated to match the index it is looping through. For instance, i
is assigned to 1 the first time through the loop, 2 the second time through the loop, 3 the third time etc. However, numbers are not the only thing we can loop through. We can actually loop through all of the elements in a vector.
## a vector for demonstration
= c("Econ 140A", "Econ 241", "Econ 290", "Econ 145")
favorite_econ
## looping through each element of the favorite_econ vector
for (i in favorite_econ){
print(i)
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"
Here, i
is still our index, but instead, we are telling the loop to go through each element in the vector one-by-one. Hence, on the first time through the loop, the index i
is assigned to “Econ 140A”, while i
is assigned to “Econ 241” the second time through etc.
Now let’s try a loop that is most frequently used: looping through a entire vector’s indices.
## looping through a vectors indices
for (i in 1:length(favorite_econ)){
print(favorite_econ[i])
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"
There is quite a lot to unpack here:
- The for-loop is assigning the index
i
variable to the number 1 the first time through, 2 the second time through, and continuing on until it reaches the the number 4 which is equal to the length of thefavorite_econ
vector. - At each iteration of the loop, we print the value of
favorite_econ
at each index. For instance,i
is equal to 1 the first time through the loop, so the first thing that will be printed isfavorite_econ[1]
which is equivalent to “Econ 140A” (check yourself!).
Another thing to note is that you do not need to have i
as the name of the index. In fact, when you combine multiple for-loops together, it can be helpful to change your index. Observe:
## econ_class is the index in this case
for (econ_class in favorite_econ){
print(econ_class)
}
## [1] "Econ 140A"
## [1] "Econ 241"
## [1] "Econ 290"
## [1] "Econ 145"
As a final example, we will modify our area_of_circle
function. The function will take a range of numbers as an input, and collect the area of the circle using each of these numbers as a radii. In particular, our function will take two inputs: minradius
and maxradius
. The minradius
argument will be the smallest radius we would like to calculate the area of the circle with, while the maxradius
argument will be the largest. The function will calculate the area of the circle for every integer between minradius
and maxradius
and return a vector with all of the areas.
## modifying the function to collect the area of circles with radii of integers 1 through 10
<- function(minradius, maxradius){
area_of_circle <- rep(NA, 10) #creating an empty vector of 10 NAs
area_vector for (i in minradius:maxradius){
= pi * i^2 #calculating the area
area_vector[i]
}return(area_vector) #returning the vector of areas
}
area_of_circle(minradius = 1, maxradius = 10)
## [1] 3.141593 12.566371 28.274334 50.265482 78.539816 113.097336 153.938040
## [8] 201.061930 254.469005 314.159265
As an aside, it is important to note that we can also loop through columns in a data frame. This is a task you will likely need to do at some point in your data wrangling career, and there isn’t much great help for this online.
library(tidyverse)
## loading in the mtcars data set from R
<- mtcars
cars
## selecting only three columns for demonstration
<- cars %>% select(mpg, hp, cyl)
cars
## looping through the columns and printing them out
for (i in names(cars)){
print(cars[i])
}
## mpg
## Mazda RX4 21.0
## Mazda RX4 Wag 21.0
## Datsun 710 22.8
## Hornet 4 Drive 21.4
## Hornet Sportabout 18.7
## Valiant 18.1
## Duster 360 14.3
## Merc 240D 24.4
## Merc 230 22.8
## Merc 280 19.2
## Merc 280C 17.8
## Merc 450SE 16.4
## Merc 450SL 17.3
## Merc 450SLC 15.2
## Cadillac Fleetwood 10.4
## Lincoln Continental 10.4
## Chrysler Imperial 14.7
## Fiat 128 32.4
## Honda Civic 30.4
## Toyota Corolla 33.9
## Toyota Corona 21.5
## Dodge Challenger 15.5
## AMC Javelin 15.2
## Camaro Z28 13.3
## Pontiac Firebird 19.2
## Fiat X1-9 27.3
## Porsche 914-2 26.0
## Lotus Europa 30.4
## Ford Pantera L 15.8
## Ferrari Dino 19.7
## Maserati Bora 15.0
## Volvo 142E 21.4
## hp
## Mazda RX4 110
## Mazda RX4 Wag 110
## Datsun 710 93
## Hornet 4 Drive 110
## Hornet Sportabout 175
## Valiant 105
## Duster 360 245
## Merc 240D 62
## Merc 230 95
## Merc 280 123
## Merc 280C 123
## Merc 450SE 180
## Merc 450SL 180
## Merc 450SLC 180
## Cadillac Fleetwood 205
## Lincoln Continental 215
## Chrysler Imperial 230
## Fiat 128 66
## Honda Civic 52
## Toyota Corolla 65
## Toyota Corona 97
## Dodge Challenger 150
## AMC Javelin 150
## Camaro Z28 245
## Pontiac Firebird 175
## Fiat X1-9 66
## Porsche 914-2 91
## Lotus Europa 113
## Ford Pantera L 264
## Ferrari Dino 175
## Maserati Bora 335
## Volvo 142E 109
## cyl
## Mazda RX4 6
## Mazda RX4 Wag 6
## Datsun 710 4
## Hornet 4 Drive 6
## Hornet Sportabout 8
## Valiant 6
## Duster 360 8
## Merc 240D 4
## Merc 230 4
## Merc 280 6
## Merc 280C 6
## Merc 450SE 8
## Merc 450SL 8
## Merc 450SLC 8
## Cadillac Fleetwood 8
## Lincoln Continental 8
## Chrysler Imperial 8
## Fiat 128 4
## Honda Civic 4
## Toyota Corolla 4
## Toyota Corona 4
## Dodge Challenger 8
## AMC Javelin 8
## Camaro Z28 8
## Pontiac Firebird 8
## Fiat X1-9 4
## Porsche 914-2 4
## Lotus Europa 4
## Ford Pantera L 8
## Ferrari Dino 6
## Maserati Bora 8
## Volvo 142E 4
12.3 If-else
Conditional statements are the heart of programming. The if-else statement evaluates whether a condition is TRUE
or FALSE
and then perform a computation based on the truth of the statement. The syntax of an if-else statement is as follows:
if (logical statement){
##perform an action if the logical statement is TRUE
}else{
## perform a different action if NOT TRUE
}
For example, suppose we wanted to loop a vector of random numbers in the range of 1 to 100 and find out how many of them are greater than 50. We could do this in the following way:
## setting the seed for replication
set.seed(1992)
## creating a random sample of 100 integers in the interval 1 to 100
<- sample(1:100, 100)
numbers
## counting how many numbers come out bigger than 50
<- 0 ## setting our counter to 0
count for (i in numbers){
## evaluating if the number is greater than 50
if (i > 50) {
<- count + 1 ## adding a 1 to our counter
count
}else{
<- count ## redundant, but here for example
count
} }
Note the set.seed
function. This is a function that allows replication of results when using random sampling techniques. Essentially it ensures that your random sample is the same random sample every time you run the program. While the seed was set to 1992, you can set the seed to any number you like. Of course, each seed number has its own unique sample (e.g. set.seed(1)
will give different results than set.seed(2)
).
12.3.1 Exercise
- Loop through your
numbers
vector and assign grade values to each of the numbers, and place the grades in a separate vector. For the grade values, if the number is less than 60, assign an “F”, if the number is between 60 and 69 assign a “D”, if the number is between 70 and 79 assign a “C”, if the number is between 80 and 89 assign a “B”, and if the number is between 90 and 100 assign an “A”. For fun, graph a histogram of the distribution.
12.4 Selected Solutions
## a solution to Exercise 1.1.1
## this function converts miles to kilometers and feet to meters
## it takes two input paramters: miles and feet
## it returns two values: the kilometers and the meters
<- function(miles, feet){
miles2kilo_feet2meter <- miles * 1.6
kilometers <- feet * 0.3
meters return(print(paste("Kilometers: ", kilometers, "Meters: ", meters)))
}
## a solution to Exercise 1.2.1
## This function calculates the sum of all numbers within a specified range
## two parameters: minval and maxvalue
## minval: the minimum value of the range
## maxval: the maximum value of the range
<- function(minval, maxval) {
sum_func = 0
total for (i in minval:maxval) {
= total + i
total
}return(print(paste("The sum of all numbers is ", total)))
}