Welcome to ID 543

Introduction to R

About this class

  • Quick! Intense!
    • Daily homeworks & final project
    • Use office hours! Your classmates! The internet!
    • It will require practice afterward, and time to sink in
    • The goal is to set you up for success and give you resources to learn more

Tip

Experiment! You are not going to break anything!

About this class

  • Everything you need is at http://id543.louisahsmith.com
    • Canvas will link you there, but good to bookmark as well
    • Everything admin/grade-related on Canvas
  • General format:
    • Some overview slides
    • An example together
    • Practice on your own/with your classmates
    • Repeat

Tip

Try to solve a problem yourself first, classmate second, teaching team third

Homeworks

Did you make a good-faith effort to answer the question using the tools we’ve covered in class?

  • Read the error message carefully. Check for missing/extra commas and parentheses. Restart R and reload the data.
  • Go back to the slides. What were the day’s goals? What were the functions we covered?
  • Check out the reading – it can be good to get another perspective.
  • Google using key words from the class. There are lots of ways to do things, but try to find strategies using tools we’ve covered in class (e.g., if you search with “tidyverse” you’ll find a lot of what we cover).
  • Ask a classmate how they approached it. Don’t copy and paste – even if you end up writing exactly what they did, type it out yourself for practice.
  • If you can’t solve a problem, include code you tried and describe the strategies you used to try to solve it.

About this class

  • Day 1: dataframes and variables
  • Day 2: data manipulation and management
  • Day 3: models and tables
  • Day 4: figures and more

About Louisa

  • Assistant professor at Northeastern University
    • Department of Public Health & Health Sciences and the Roux Institute (Portland)
  • Started using R during my master’s (so almost 10 years of experience)
    • Learned mostly by doing!
    • Twitter, blogs, RStudio::conf, meetups
  • First iteration of this class when I was a PhD student here
  • Basically everything I do is in R!

About Xiyue

Education

  • Bachelor’s Degree in Nutrition, University of Washington, Seattle
  • Current MS Student in Epidemiology, Harvard University

Previous Experience

  • Research Assistant/Student Researcher at Duke Kunshan University, Tsinghua University, and Peking University
  • Used STATA and R for research

Research Interests

  • Diet and NAFLD, Liver Cancer in older adults

Today’s goals

  • Familiarize yourselves with RStudio
  • Introduce you to the tidyverse and the concept of packages
  • Explore data stored in dataframes
  • Create new variables
  • Learn about factor variables and how to manipulate them

RStudio

Start fresh

  • If you have used R previously, an old workspace may still be active when you open RStudio
  • You always want to start with a fresh session
  • Go to Tools -> Global Options, and under General, change these settings:

Tip

Now, you can just quit and restart RStudio if something goes wrong! You can also go to Session -> Restart R to clear your session.

Rainbow parentheses

Always confirm you are closing your parentheses!

Tools -> Global Options -> Code -> Display -> Rainbow Parentheses

Print output to console

You can run…

  • code that you type directly in the console
    • code you won’t need to run again
  • code in an .R script
  • code in a .qmd (Quarto) or .Rmd (R Markdown) file
    • code you want to render to an html, word, or pdf file

I like to have all code print to the console for consistency:

Packages

  • Some functions are built into R
    • mean(), lm(), table(), etc.
  • They actually come from built-in packages
    • base, stats, graphics, etc.
  • Anyone (yes, anyone) build their own package to add to the functionality of R
    • {ggplot2}, {dplyr}, {data.table}, {survival}, etc. 1

Packages

  • You have to install a package once1
install.packages("survival")
  • You then have to load the package every time you want to use it
library(survival)

Packages

“You only have to buy the book once, but you have to go get it out of the bookshelf every time you want to read it.”

install.packages("survival")
library(survival)
survfit(...)

Several days later…

library(survival)
coxph(...)

Package details

  • When you use install.packages, packages are downloaded from CRAN (The Comprehensive R Archive Network)
    • This is also where you downloaded R
  • Packages can be hosted lots of other places, such as Bioconductor (for bioinformatics), and Github (for personal projects or while still developing)
  • The folks at CRAN check to make things “work” in some sense, but don’t check on the statistical methods…
    • But because R is open-source, you can always read the code yourself
  • Two functions from different packages can have the same name… if you load them both, you may have some trouble

Demo

Script vs. console, installing packages, and changing settings

The biggest difference between R and Stata is that R can have many different objects in its environment

  • datasets, numbers, figures, etc.
  • you have to be explicit about storing and retrieving objects
    • e.g., what dataset a variable belongs to

R uses <- to store objects in the environment

I call this the “assignment arrow”

# create values
vals <- c(1, 645, 329)

Now vals holds those values

Warning

No assignment arrow means that the object will be printed to the console (and lost forever!)

Objects

We can retrieve those values by running just the name of the object

vals
[1]   1 645 329

We can also perform operations on them using functions like mean()

mean(vals)
[1] 325

If we want to keep the result of that operation, we need to use <- again

mean_val <- mean(vals)

Types of data (classes)

We could also create a character vector:

chars <- c("dog", "cat", "rhino")
chars
[1] "dog"   "cat"   "rhino"

Or a logical vector:

logs <- c(TRUE, FALSE, FALSE)
logs
[1]  TRUE FALSE FALSE

Note

We’ll see more options as we go along!

Types of objects

We created vectors with the c() function (c stands for concatenate)

We could also create a matrix of values with the matrix() function:

# turn the vector of numbers into a 2-row matrix
mat <- matrix(c(234, 7456, 12, 654, 183, 753), nrow = 2)
mat
     [,1] [,2] [,3]
[1,]  234   12  183
[2,] 7456  654  753

Indices

The numbers in square brackets are indices, which we can use to pull out values:

# extract second animal
chars[2]
[1] "cat"

We can pull out rows or columns from matrices:

# extract second row
mat[2, ]
[1] 7456  654  753
# extract first column
mat[, 1]
[1]  234 7456

Exercise

Pre-class challenges

Dataframes

  • We usually do analysis in R with dataframes (or some variant)
  • Dataframes basically work like spreadsheets: here, columns are variables, and rows are observations
  • Here’s some data from the National Longitudinal Survey of Youth:
nlsy
# A tibble: 1,205 × 15
      id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth   sex region
   <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <dbl>    <dbl> <dbl>  <dbl>
 1     3       0        1          5          7     3        3     2      1
 2     6       1        2          6          7     1        3     1      1
 3     8       0        2          7          9     7        3     2      1
 4    16       1        3          6          7     3        3     2      1
 5    18       0        3         10         10     2        3     1      3
 6    20       1        2          7          8     2        3     2      1
 7    27       0        1          8          8     1        3     2      1
 8    49       1        1          8          8     6        3     2      1
 9    57       1        2          7          8     1        3     2      1
10    67       0        1          8          8     1        3     1      1
# ℹ 1,195 more rows
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
#   glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>

New function: glimpse()

We can get a quick overview of the data with the glimpse() function:

glimpse(nlsy)
Rows: 1,205
Columns: 15
$ id           <dbl> 3, 6, 8, 16, 18, 20, 27, 49, 57, 67, 86, 96, 97, 98, 117,…
$ glasses      <dbl> 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, …
$ eyesight     <dbl> 1, 2, 2, 3, 3, 2, 1, 1, 2, 1, 3, 5, 1, 1, 1, 1, 3, 2, 3, …
$ sleep_wkdy   <dbl> 5, 6, 7, 6, 10, 7, 8, 8, 7, 8, 8, 7, 7, 7, 8, 7, 7, 8, 8,…
$ sleep_wknd   <dbl> 7, 7, 9, 7, 10, 8, 8, 8, 8, 8, 8, 7, 8, 7, 8, 7, 4, 8, 8,…
$ nsibs        <dbl> 3, 1, 7, 3, 2, 2, 1, 6, 1, 1, 7, 2, 7, 2, 2, 4, 9, 2, 2, …
$ race_eth     <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, …
$ sex          <dbl> 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, …
$ region       <dbl> 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ income       <dbl> 22390, 35000, 7227, 48000, 4510, 50000, 20000, 23900, 232…
$ age_bir      <dbl> 19, 30, 17, 31, 19, 30, 27, 24, 21, 36, 17, 19, 29, 30, 2…
$ eyesight_cat <fct> Excellent, Very Good, Very Good, Good, Good, Very Good, E…
$ glasses_cat  <fct> Doesn't wear glasses, Wears glasses/contacts, Doesn't wea…
$ race_eth_cat <fct> "Non-Black, Non-Hispanic", "Non-Black, Non-Hispanic", "No…
$ sex_cat      <fct> Female, Male, Female, Female, Male, Female, Female, Femal…

Note

Notice that I write a function name followed by parentheses to signal it is a function, and can take arguments within the parentheses

New function: summary()

We can also get a summary of the data with the summary() function:

summary(nlsy)
       id           glasses          eyesight      sleep_wkdy    
 Min.   :    3   Min.   :0.0000   Min.   :1.00   Min.   : 0.000  
 1st Qu.: 2317   1st Qu.:0.0000   1st Qu.:1.00   1st Qu.: 6.000  
 Median : 4744   Median :1.0000   Median :2.00   Median : 7.000  
 Mean   : 5229   Mean   :0.5178   Mean   :1.99   Mean   : 6.643  
 3rd Qu.: 7937   3rd Qu.:1.0000   3rd Qu.:3.00   3rd Qu.: 8.000  
 Max.   :12667   Max.   :1.0000   Max.   :5.00   Max.   :13.000  
   sleep_wknd         nsibs           race_eth          sex       
 Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   :1.000  
 1st Qu.: 6.000   1st Qu.: 2.000   1st Qu.:2.000   1st Qu.:1.000  
 Median : 7.000   Median : 3.000   Median :3.000   Median :2.000  
 Mean   : 7.267   Mean   : 3.937   Mean   :2.395   Mean   :1.584  
 3rd Qu.: 8.000   3rd Qu.: 5.000   3rd Qu.:3.000   3rd Qu.:2.000  
 Max.   :14.000   Max.   :16.000   Max.   :3.000   Max.   :2.000  
     region          income         age_bir         eyesight_cat
 Min.   :1.000   Min.   :    0   Min.   :13.00   Excellent:474  
 1st Qu.:2.000   1st Qu.: 6000   1st Qu.:19.00   Very Good:385  
 Median :3.000   Median :11155   Median :22.00   Good     :249  
 Mean   :2.593   Mean   :15289   Mean   :23.45   Fair     : 78  
 3rd Qu.:3.000   3rd Qu.:20000   3rd Qu.:27.00   Poor     : 19  
 Max.   :4.000   Max.   :75001   Max.   :52.00                  
                 glasses_cat                   race_eth_cat   sex_cat   
 Doesn't wear glasses  :581   Hispanic               :211   Male  :501  
 Wears glasses/contacts:624   Black                  :307   Female:704  
                              Non-Black, Non-Hispanic:687               
                                                                        
                                                                        
                                                                        

Indices in dataframes

We can pull out data from dataframes using the “square bracket notation” we already saw:

nlsy[3, ]
# A tibble: 1 × 15
     id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth   sex region
  <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <dbl>    <dbl> <dbl>  <dbl>
1     8       0        2          7          9     7        3     2      1
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
#   glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>
nlsy[, 3]
# A tibble: 1,205 × 1
   eyesight
      <dbl>
 1        1
 2        2
 3        2
 4        3
 5        3
 6        2
 7        1
 8        1
 9        2
10        1
# ℹ 1,195 more rows

Dollar sign notation

It’s much more useful to be able to pull out a variable by its name, though:

nlsy$sex_cat
  [1] Female Male   Female Female Male   Female Female Female Female Male  
 [11] Female Female Female Female Male   Female Female Female Female Female
 [21] Female Male   Female Female Female Male   Female Male   Female Female
 [31] Male   Male   Male   Female Female Male   Female Female Male   Female
 [41] Female Male   Female Male   Female Male   Male   Female Male   Male  
 [51] Male   Female Female Female Male   Female Male   Female Male   Male  
 [61] Male   Female Male   Female Female Male   Male   Female Female Male  
 [71] Female Male   Male   Male   Female Male   Male   Male   Male   Female
 [81] Female Female Female Female Female Female Male   Female Male   Female
 [91] Male   Female Female Female Female Female Female Female Male   Female
[101] Female Female Female Female Male   Male   Female Male   Male   Female
[111] Female Male   Male   Male   Male   Female Male   Male   Male   Male  
[121] Female Female Male   Female Female Female Female Female Male   Male  
[131] Female Female Male   Female Female Female Female Female Male   Female
[141] Male   Male   Female Female Male   Male   Female Female Female Female
[151] Female Female Female Female Male   Female Female Male   Male   Male  
[161] Female Male   Female Female Male   Female Female Female Female Male  
[171] Male   Female Female Male   Female Female Female Female Female Female
[181] Female Male   Male   Female Male   Female Male   Female Female Female
[191] Male   Female Female Female Female Male   Female Male   Male   Male  
Levels: Male Female

Summarize a single variable

We can also get a summary of a single variable:

summary(nlsy$sex_cat)
  Male Female 
   501    704 
summary(nlsy$income)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    6000   11155   15289   20000   75001 

Variables

  • Variables can be different types, including numeric, character, logical, and factor.
  • You can check what type of variable you’re dealing with: class(nlsy$sex_cat) (factor!)
  • A special type of dataframe called a “tibble” will show you at the top:
nlsy
# A tibble: 1,205 × 15
      id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth   sex region
   <dbl>   <dbl>    <dbl>      <dbl>      <dbl> <dbl>    <dbl> <dbl>  <dbl>
 1     3       0        1          5          7     3        3     2      1
 2     6       1        2          6          7     1        3     1      1
 3     8       0        2          7          9     7        3     2      1
 4    16       1        3          6          7     3        3     2      1
 5    18       0        3         10         10     2        3     1      3
 6    20       1        2          7          8     2        3     2      1
 7    27       0        1          8          8     1        3     2      1
 8    49       1        1          8          8     6        3     2      1
 9    57       1        2          7          8     1        3     2      1
10    67       0        1          8          8     1        3     1      1
# ℹ 1,195 more rows
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
#   glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>

tibbles are basically just pretty dataframes

as_tibble(nlsy)[, 1:4]
# A tibble: 1,205 × 4
      id glasses eyesight sleep_wkdy
   <dbl>   <dbl>    <dbl>      <dbl>
 1     3       0        1          5
 2     6       1        2          6
 3     8       0        2          7
 4    16       1        3          6
 5    18       0        3         10
 6    20       1        2          7
 7    27       0        1          8
 8    49       1        1          8
 9    57       1        2          7
10    67       0        1          8
# ℹ 1,195 more rows
as.data.frame(nlsy)[, 1:4]
        id glasses eyesight sleep_wkdy
1        3       0        1          5
2        6       1        2          6
3        8       0        2          7
4       16       1        3          6
5       18       0        3         10
6       20       1        2          7
7       27       0        1          8
8       49       1        1          8
9       57       1        2          7
10      67       0        1          8
11      86       0        3          8
12      96       1        5          7
13      97       1        1          7
14      98       0        1          7
15     117       0        1          8
16     137       0        1          7
17     172       0        3          7
18     179       1        2          8
19     186       1        3          8
20     200       1        3          8
21     205       0        4          7
22     218       1        2          6
23     227       0        2          8
24     237       0        5          7
25     242       0        1          8
26     243       0        1          7
27     244       1        2          7
28     247       0        4          7
29     250       0        1          6
30     256       1        2          6
31     259       0        2          6
32     274       1        3          6
33     281       1        2          7
34     290       0        2          7
35     297       1        3          8
36     317       1        2          7
37     333       0        2          6
38     335       0        4          6
39     337       0        3          6
40     343       0        3          3
41     354       1        2          6
42     357       0        1          7
43     369       1        1          7
44     377       0        1          5
45     382       1        4          6
46     392       0        1          6
47     398       0        1          5
48     400       1        1          7
49     409       1        3          7
50     410       0        1          6
51     422       1        2          7
52     423       1        1          6
53     437       0        1          5
54     442       0        3          7
55     443       0        2          7
56     444       1        1          8
57     457       0        1          7
58     458       1        1          7
59     466       0        2          7
60     481       1        1          8
61     482       1        1          7
62     503       0        2          7
63     513       1        3          5
64     550       1        2          6
65     552       1        2          7
66     553       0        1          6
67     555       0        1          4
68     557       1        1          7
69     582       1        1          8
70     583       1        2          5
71     619       1        1          6
72     620       0        1          7
73     625       1        1          7
74     631       0        3          6
75     632       1        2          6
76     644       0        1          5
77     647       0        1          7
78     649       1        3          6
79     653       0        2          6
80     664       1        2          6
81     692       0        3          4
82     704       1        1          7
83     706       1        1          7
84     708       1        1          8
85     712       1        4          6
86     720       0        2          6
87     731       0        3          7
88     739       1        1          8
89     742       0        1          7
90     752       1        1          8
91     753       1        2          6
92     755       1        1          5
93     761       1        1          5
94     775       1        2          5
95     792       1        1          7
96     801       1        1          4
97     812       1        1          7
98     820       0        1          7
99     825       1        2          7
100    836       1        4          5
101    848       1        3          5
102    855       1        2          7
103    856       1        1          6
104    862       1        1          8
105    881       0        2          6
106    888       0        1          7
107    889       0        4          9
108    890       1        2          7
109    891       0        2          8
110    896       1        2          8
111    914       1        1          7
112    916       1        1          7
113    919       0        4          8
114    924       1        1          7
115    931       1        3          6
116    932       1        2          5
117    947       0        1          7
118    960       0        1          7
119    984       1        2          7
120    985       0        2          7
121    992       0        1          7
122    995       1        2          7
123   1000       0        1          6
124   1009       0        3          7
125   1034       0        2          5
126   1039       1        2          8
127   1050       0        3          8
128   1056       0        3          7
129   1059       1        3          6
130   1060       0        1          7
131   1063       0        3          8
132   1065       0        3          8
133   1068       0        3          6
134   1070       0        4          6
135   1077       0        4          8
136   1079       1        3          6
137   1088       1        1          7
138   1101       0        1          7
139   1102       0        1          6
140   1111       1        3          8
141   1122       1        3          6
142   1142       1        2          7
143   1163       1        4          5
144   1166       1        3          5
145   1169       0        1          9
146   1185       0        1          7
147   1189       0        1          5
148   1199       0        2          6
149   1212       1        3          7
150   1225       1        3          6
151   1226       0        3          6
152   1227       0        3          8
153   1233       1        3          4
154   1238       1        2          6
155   1249       1        1          4
156   1250       0        2          8
157   1261       1        3          8
158   1272       0        3          6
159   1288       1        4          7
160   1291       0        3          8
161   1293       1        1          6
162   1311       0        1          7
163   1312       1        3          7
164   1327       1        1          6
165   1342       0        3          3
166   1347       1        1          6
167   1352       1        1          6
168   1359       0        1          6
169   1360       1        1          7
170   1368       1        1          7
171   1393       1        2          7
172   1394       0        1          6
173   1411       1        3          7
174   1413       0        1          6
175   1415       1        1          5
176   1451       1        1          8
177   1455       1        1          3
178   1465       1        1          5
179   1469       1        4          7
180   1470       1        2          7
181   1474       1        3          9
182   1479       1        3          6
183   1481       0        1          7
184   1484       0        2          7
185   1520       1        2          5
186   1524       1        2          4
187   1527       0        1          6
188   1539       0        2          9
189   1541       0        1          8
190   1546       0        2          6
191   1547       0        3          8
192   1548       1        3          7
193   1551       1        1          7
194   1552       1        2          8
195   1553       0        1          9
196   1554       1        2          8
197   1566       1        1          6
198   1569       1        1          7
199   1575       1        1          5
200   1577       1        2          8
201   1587       1        3          6
202   1593       1        2          6
203   1596       1        3          5
204   1599       1        4          8
205   1600       1        3          7
206   1603       0        2          6
207   1605       1        3          7
208   1609       1        1          7
209   1610       0        2          7
210   1616       1        1          7
211   1617       1        1          6
212   1622       1        2          8
213   1623       1        1          7
214   1638       1        1          8
215   1640       1        1          6
216   1641       1        1          6
217   1656       1        3          6
218   1674       0        4          6
219   1685       0        1          7
220   1714       1        3          7
221   1729       0        3          7
222   1730       1        2          6
223   1735       0        2          7
224   1740       1        3          7
225   1746       1        3          7
226   1747       0        2          8
227   1766       0        2          6
228   1770       1        3          6
229   1778       0        2          7
230   1785       0        2          6
231   1795       1        2          6
232   1804       1        2          7
233   1808       1        1          7
234   1824       0        2          6
235   1833       0        1          7
236   1847       1        1          8
237   1855       1        2          8
238   1862       0        1          5
239   1865       1        4          8
240   1872       0        2          6
241   1875       1        2          6
242   1878       1        3          7
243   1880       1        2          5
244   1885       1        3          6
245   1901       0        2          7
246   1906       0        2          8
247   1910       0        2          8
248   1912       1        1          7
249   1914       1        1          6
250   1918       0        3          7
251   1930       1        1          7
252   1947       1        1          7
253   1954       1        1          8
254   1961       1        2          6
255   1962       1        1          7
256   1965       0        2          8
257   1966       1        2          6
258   1980       0        2          8
259   1984       0        2          5
260   1990       1        1          7
261   1994       0        3          7
262   2002       0        3          6
263   2003       1        3          8
264   2025       1        2          4
265   2027       1        2          7
266   2030       0        2          8
267   2055       1        1          6
268   2056       0        1          7
269   2064       1        2          8
270   2070       0        2          7
271   2073       1        2          6
272   2075       1        2          6
273   2076       0        2          6
274   2083       0        1          6
275   2093       1        1          6
276   2094       1        2          5
277   2097       0        1          8
278   2100       1        1          6
279   2101       1        3          7
280   2102       1        2          7
281   2104       1        1          5
282   2119       1        2          5
283   2126       1        2          6
284   2131       1        3          5
285   2158       0        2          7
286   2159       1        3          7
287   2163       0        2          7
288   2168       1        1          4
289   2190       0        1          7
290   2191       1        1          7
291   2196       1        1          5
292   2203       0        3          7
293   2221       1        3          7
294   2222       0        3          5
295   2227       1        2          7
296   2231       0        3          5
297   2237       0        1          5
298   2247       0        1          8
299   2281       0        3          6
300   2313       0        2          8
301   2314       1        1          7
302   2317       1        3          5
303   2318       0        2         10
304   2329       1        3          5
305   2336       1        2          7
306   2340       0        5          7
307   2359       0        4          9
308   2375       0        2          7
309   2380       1        5          7
310   2394       0        3          5
311   2395       0        1         10
312   2398       0        3          7
313   2399       0        2          3
314   2404       0        2          9
315   2407       1        3          5
316   2415       0        1          4
317   2416       0        1          8
318   2418       1        1          8
319   2437       1        2          7
320   2442       1        2          7
321   2447       0        1          5
322   2449       0        2          6
323   2450       1        2          7
324   2459       1        2          5
325   2474       1        5         10
326   2478       1        1          7
327   2481       1        3          6
328   2483       0        2          5
329   2494       1        1          8
330   2523       0        1          7
331   2525       0        3          8
332   2535       0        2          7
333   2536       0        2          6
334   2539       0        1          8
335   2541       1        1          6
336   2544       1        3          8
337   2545       0        1          8
338   2549       1        3          6
339   2550       1        1          8
340   2551       1        1          5
341   2555       0        1          6
342   2565       1        1          6
343   2566       0        2          7
344   2568       1        2          6
345   2569       1        2          7
346   2573       0        1          7
347   2594       1        2          8
348   2599       1        2          4
349   2614       1        1          6
350   2616       1        1          6
351   2629       1        3          7
352   2634       0        1         10
353   2637       1        1          5
354   2640       0        3          6
355   2646       1        1          7
356   2663       0        4          6
357   2672       1        3          8
358   2674       0        3          8
359   2676       0        1          3
360   2679       1        1          7
361   2693       0        2          6
362   2698       1        1          6
363   2702       0        1          6
364   2703       1        2          8
365   2705       0        3          8
366   2724       0        1          6
367   2728       1        1          6
368   2729       1        4          5
369   2741       0        3         10
370   2742       0        2          7
371   2745       1        3          3
372   2746       1        1          8
373   2748       1        2          8
374   2770       0        2          6
375   2771       0        2          6
376   2779       0        4          7
377   2781       0        4          7
378   2795       0        2          5
379   2803       0        3          5
380   2809       0        1          8
381   2813       0        3          8
382   2817       0        4          8
383   2866       0        1          8
384   2877       1        3          7
385   2885       1        3          6
386   2896       1        2          7
387   2902       1        1          6
388   2908       1        1          7
389   2936       1        2          9
390   2941       0        1          7
391   2948       1        1          7
392   2949       0        1          8
393   2962       0        2          7
394   2965       0        2          8
395   2980       1        2          8
396   2984       1        2          7
397   2996       0        1          6
398   3036       1        2          5
399   3037       1        1          8
400   3041       1        2          8
401   3046       0        3          5
402   3051       1        4          8
403   3057       1        1          5
404   3069       1        2          8
405   3075       0        3          6
406   3079       0        2          7
407   3084       0        2          6
408   3095       1        2          6
409   3100       1        1          7
410   3102       1        1          8
411   3125       1        3          6
412   3138       0        2          7
413   3145       1        2          8
414   3152       1        3          5
415   3156       1        2          5
416   3159       0        2          6
417   3163       0        2          8
418   3168       1        3          6
419   3170       1        2          6
420   3173       0        2          7
421   3179       0        4          8
422   3194       0        1          7
423   3198       1        2          6
424   3224       1        2          6
425   3250       1        2          8
426   3255       0        1          7
427   3303       0        1          8
428   3309       1        1          6
429   3316       1        2          5
430   3324       1        2          6
431   3325       1        4          8
432   3338       1        1          6
433   3351       0        5          6
434   3355       1        1          7
435   3356       1        3          5
436   3369       1        4          7
437   3390       1        1          7
438   3391       1        1          6
439   3392       0        1          6
440   3413       1        1          7
441   3416       1        1          7
442   3423       0        2          7
443   3429       0        3          7
444   3433       1        2          7
445   3444       0        4          6
446   3447       1        2          6
447   3461       0        2          6
448   3475       1        2          8
449   3481       1        2          6
450   3494       1        3          6
451   3495       1        1          7
452   3502       1        3          6
453   3509       0        4          8
454   3515       1        1          7
455   3530       1        2          6
456   3533       0        1          7
457   3538       0        3          5
458   3547       1        2          6
459   3564       1        2          6
460   3567       1        1          6
461   3575       1        1          7
462   3576       0        3          9
463   3581       1        1          8
464   3584       1        3          3
465   3587       1        2          7
466   3589       0        2          7
467   3594       1        2          6
468   3597       0        2          7
469   3610       1        2          7
470   3617       0        1          6
471   3619       1        1          7
472   3628       1        1          7
473   3648       0        2          7
474   3651       1        1          7
475   3655       1        1          6
476   3659       0        3          8
477   3694       0        4          7
478   3700       0        3          6
479   3703       1        4          8
480   3704       0        4         11
481   3707       1        1          7
482   3713       1        2          6
483   3719       0        2          6
484   3721       0        2          7
485   3722       0        2          8
486   3723       0        4          8
487   3733       0        1          6
488   3736       1        2          6
489   3768       1        3          4
490   3769       1        1          8
491   3781       1        2          6
492   3787       0        2          8
493   3791       0        1          4
494   3793       0        1          8
495   3799       0        1          8
496   3804       1        1          7
497   3807       1        1          4
498   3817       1        1          7
499   3829       0        2          6
500   3839       1        1          5
501   3860       1        2          7
502   3866       0        3          6
503   3877       1        3          7
504   3889       0        2          7
505   3895       1        3          6
506   3901       0        2          8
507   3908       1        2          6
508   3954       0        1          6
509   3967       1        1          7
510   3970       1        2          6
511   3972       1        1          7
512   3985       1        1          5
513   3987       0        1          7
514   3988       0        3          5
515   3992       0        2          7
516   3995       1        2          5
517   4025       0        4          5
518   4041       0        2          7
519   4042       1        2          7
520   4046       1        5          6
521   4048       0        1          6
522   4049       1        1          4
523   4060       1        2          7
524   4061       0        1          7
525   4066       1        1          9
526   4070       1        4          6
527   4076       1        1          6
528   4091       0        2          8
529   4092       0        1          9
530   4121       1        2          4
531   4122       1        1          6
532   4168       1        1          9
533   4177       0        4          6
534   4208       0        2          7
535   4219       0        3          7
536   4220       0        1          4
537   4224       1        1          7
538   4226       1        3          5
539   4228       1        2          8
540   4232       0        3          7
541   4236       0        2          6
542   4238       1        1          7
543   4246       1        3          7
544   4262       0        4          6
545   4264       0        1          9
546   4270       0        5          8
547   4274       1        2          8
548   4278       1        3          7
549   4280       1        1          8
550   4290       1        2          8
551   4294       0        3          8
552   4304       0        2          7
553   4306       1        1          5
554   4313       0        1          7
555   4328       0        2          7
556   4335       1        2          8
557   4341       1        3          6
558   4345       0        3          6
559   4350       1        1          7
560   4368       1        4          7
561   4372       1        1          7
562   4379       1        2         10
563   4386       1        5          8
564   4398       1        1          6
565   4428       1        1          7
566   4450       1        4          8
567   4468       0        1          8
568   4469       0        3          7
569   4475       0        2          4
570   4476       0        2          6
571   4483       1        1          5
572   4490       0        1          7
573   4494       1        2          6
574   4496       1        1          6
575   4511       0        3          7
576   4513       0        2          7
577   4536       1        2          7
578   4538       0        3          8
579   4559       1        1          6
580   4569       0        2          5
581   4571       1        2          8
582   4578       0        3          7
583   4589       0        2          8
584   4590       0        3          8
585   4594       1        1          7
586   4607       1        2          6
587   4609       1        1          7
588   4625       1        1          7
589   4636       1        1          7
590   4650       0        2          7
591   4664       0        2          7
592   4668       1        2          6
593   4687       1        1          8
594   4691       1        1          6
595   4695       1        2          7
596   4720       0        1          6
597   4723       0        2          5
598   4724       0        4          5
599   4731       1        2          6
600   4735       0        3          7
601   4737       0        1          8
602   4738       0        1          5
603   4744       1        2          6
604   4750       0        1          6
605   4753       1        3          8
606   4754       1        2          8
607   4769       0        1          8
608   4772       1        3          8
609   4774       0        3          5
610   4780       1        1          6
611   4800       0        2          6
612   4806       1        5          6
613   4810       1        1          7
614   4815       1        2          8
615   4817       1        1          8
616   4828       1        3          7
617   4837       1        1          6
618   4846       1        2          5
619   4858       1        1          6
620   4873       1        2          8
621   4876       1        1          7
622   4879       1        1          8
623   4881       1        2          7
624   4883       0        1          7
625   4896       1        2          7
626   4901       0        3          8
627   4907       1        1          8
628   4914       0        2          6
629   4919       0        1          7
630   4921       0        1          8
631   4926       1        3          6
632   4934       1        1          6
633   4943       1        1          6
634   4945       0        2          8
635   4950       1        3          5
636   4973       0        2          5
637   4978       1        3          8
638   5007       0        2          6
639   5017       0        1          6
640   5026       0        1          7
641   5028       1        1          7
642   5029       1        2          6
643   5035       1        3          8
644   5048       1        1          6
645   5061       0        1          8
646   5088       0        3          0
647   5102       1        2          8
648   5108       0        2          6
649   5115       1        3          6
650   5125       1        3          7
651   5133       0        1          8
652   5141       0        1          8
653   5142       1        3          6
654   5145       0        2          6
655   5147       0        4          7
656   5149       0        4          6
657   5162       1        1          7
658   5170       1        1          7
659   5178       0        2          6
660   5186       0        2          5
661   5187       1        2          7
662   5198       1        1          7
663   5199       0        1          7
664   5212       0        1          7
665   5218       1        2          7
666   5226       1        1          6
667   5244       0        2          8
668   5251       0        4          9
669   5255       1        1          8
670   5268       1        2          8
671   5269       0        1          4
672   5271       1        2          6
673   5272       0        3          6
674   5289       0        1          8
675   5294       0        1          8
676   5295       1        2          7
677   5299       0        1          7
678   5315       0        1          7
679   5325       0        1          4
680   5326       0        2          7
681   5327       1        2          7
682   5328       1        1          5
683   5332       1        1          7
684   5337       0        1          7
685   5339       1        2          7
686   5352       1        2          7
687   5372       1        1          8
688   5373       0        2          7
689   5382       0        2          8
690   5389       0        2          7
691   5390       0        3          8
692   5397       0        2          7
693   5402       0        2          8
694   5405       0        1          7
695   5406       1        1          7
696   5416       0        2          5
697   5419       0        2          6
698   5421       1        3          6
699   5424       1        3          7
700   5428       1        3          6
701   5455       0        4          9
702   5487       1        2          6
703   5488       1        4          5
704   5489       0        1          6
705   5495       0        3          8
706   5496       0        1          5
707   5497       0        1          5
708   5508       0        1          6
709   5526       0        2          6
710   5528       1        2          7
711   5531       1        3          7
712   5555       0        4          6
713   5557       1        2          7
714   5573       1        2          7
715   5594       1        1          6
716   5599       0        1          7
717   5615       1        2          8
718   5618       0        1          7
719   5638       0        2          7
720   5649       1        1          7
721   5651       0        1          6
722   5666       0        1          8
723   5669       1        2          8
724   5675       0        3          8
725   5681       1        2          6
726   5684       0        4          5
727   5690       0        2          9
728   5691       0        1          6
729   5695       1        1          8
730   5699       1        1          7
731   5702       1        2          7
732   5705       1        1          7
733   5707       0        3          7
734   5708       1        2          8
735   5713       0        1          6
736   5723       0        1          8
737   5731       1        4          6
738   5735       1        1          6
739   5750       0        2          7
740   5754       0        2          9
741   5756       1        3          9
742   5776       0        1          8
743   5784       1        3          7
744   5790       1        1          8
745   5791       1        3          8
746   5793       0        1          5
747   5803       0        2          8
748   5813       0        3          7
749   5821       0        1          8
750   5827       1        3          6
751   5829       0        2          6
752   5839       1        2          7
753   5840       1        1          8
754   5849       1        3          5
755   5855       0        2          6
756   5864       1        3          8
757   5886       0        2          8
758   5887       0        1          7
759   5905       1        1          6
760   5913       0        1         12
761   5915       0        5          8
762   5927       0        3          8
763   5928       1        4          4
764   5935       1        2          7
765   5941       1        3          7
766   5953       1        2          7
767   5999       0        4          4
768   6019       1        1          5
769   6043       1        3          5
770   6053       1        3          6
771   6069       0        3          8
772   6072       0        3          5
773   6078       1        5          7
774   6089       1        1         10
775   6094       0        2          6
776   6112       0        1          8
777   6118       1        3          7
778   6121       1        2         10
779   6128       0        2          4
780   6156       0        3          8
781   6160       0        1          8
782   6171       1        3          8
783   6185       1        1          8
784   6201       0        1          6
785   6209       0        1          8
786   6220       0        2          6
787   6231       1        2          8
788   6234       0        1         10
789   6235       1        1          5
790   6236       0        1          6
791   6239       1        1          7
792   6252       0        1          5
793   6272       1        1          5
794   6278       1        1          7
795   6280       1        5          6
796   6291       0        2          6
797   6308       0        4          7
798   6317       1        3          8
799   6321       1        1          6
800   6338       1        2          5
801   6339       1        3          6
802   6345       0        3          4
803   6357       0        2         12
804   6377       1        2          6
805   6381       0        2          6
806   6387       0        1          6
807   6399       0        3          6
808   6415       0        2          8
809   6426       0        1          9
810   6449       1        2          5
811   6458       0        2          5
812   6485       0        2          6
813   6491       1        2          7
814   6574       0        2          8
815   6647       0        2          7
816   6693       1        1          7
817   6740       0        1          5
818   6769       0        1          7
819   6773       1        1          8
820   6776       0        1          6
821   6787       0        1          6
822   6789       1        1          8
823   6800       1        2          6
824   6803       0        1          7
825   6843       1        1          5
826   6852       0        1          6
827   6858       1        4          8
828   6862       1        1          4
829   6863       0        3          6
830   6867       0        3          8
831   6878       0        3          6
832   6913       0        4          4
833   6967       0        3          6
834   6970       0        1          2
835   6971       1        3          6
836   7049       1        1          6
837   7064       0        1          6
838   7066       1        3          4
839   7097       0        3          8
840   7099       0        3          6
841   7101       1        2          7
842   7102       1        2          6
843   7103       1        1          8
844   7113       0        1          3
845   7116       0        1          6
846   7124       1        3          6
847   7125       0        2          7
848   7137       1        3          7
849   7138       1        3          7
850   7140       0        1          3
851   7143       1        4          6
852   7148       1        3          7
853   7155       0        2          5
854   7179       1        2          8
855   7192       0        5          2
856   7249       0        3          8
857   7302       0        1          5
858   7313       0        1          6
859   7316       1        2          8
860   7340       1        3          6
861   7368       1        2          5
862   7411       0        1          3
863   7430       0        1          6
864   7437       0        1          8
865   7442       0        5          6
866   7470       1        1          8
867   7474       0        2          8
868   7493       0        2          2
869   7498       1        2          7
870   7501       0        4          2
871   7508       0        3          8
872   7509       0        3          5
873   7510       1        4          3
874   7552       0        1          5
875   7568       0        2          6
876   7569       0        1          4
877   7583       0        3          5
878   7598       0        2          6
879   7620       1        2          5
880   7622       0        1          6
881   7629       0        1          5
882   7636       1        2          6
883   7646       1        4          6
884   7685       1        2          9
885   7689       1        2          6
886   7691       0        1          8
887   7712       0        2          8
888   7748       0        2          7
889   7750       1        1          8
890   7754       1        1          8
891   7789       0        3          8
892   7794       0        2          7
893   7795       0        3          6
894   7799       0        4         10
895   7804       0        1          5
896   7805       0        3          8
897   7830       0        3          8
898   7841       1        4          6
899   7852       1        2          5
900   7867       1        1          6
901   7903       1        1          7
902   7909       0        1          7
903   7923       0        1          6
904   7937       0        1          8
905   7938       0        2          6
906   7943       1        2          8
907   7967       1        3          7
908   7968       0        3          7
909   7981       0        3          5
910   8000       0        2          7
911   8002       0        1          7
912   8009       1        2          7
913   8028       1        3          5
914   8041       0        4          4
915   8046       0        1          8
916   8047       1        1          4
917   8063       1        2          7
918   8103       1        1          7
919   8107       1        1          8
920   8120       1        2          8
921   8121       0        2          8
922   8123       1        1          6
923   8127       1        3          7
924   8139       0        2          5
925   8140       0        1          8
926   8141       1        2          6
927   8143       1        1          7
928   8151       0        2          6
929   8173       1        2          7
930   8177       0        3          5
931   8193       0        1          7
932   8197       1        4          8
933   8209       0        4          6
934   8216       1        1          5
935   8235       1        3          8
936   8237       1        1          7
937   8243       0        2          6
938   8257       1        1          8
939   8267       1        3          6
940   8271       1        5          6
941   8272       0        2          8
942   8273       1        2          8
943   8274       0        2          6
944   8276       1        1          4
945   8286       1        3          8
946   8287       1        1          6
947   8304       0        1          7
948   8305       0        4          5
949   8309       0        4          6
950   8327       0        1          8
951   8364       1        3          6
952   8377       0        1          4
953   8388       0        1          7
954   8391       1        3          8
955   8392       0        1          6
956   8395       1        1          6
957   8406       0        1          8
958   8407       0        2          9
959   8434       1        3          7
960   8479       1        3          7
961   8490       1        2          6
962   8493       0        4          6
963   8494       0        1          6
964   8495       0        2          7
965   8513       1        2          5
966   8549       0        2          5
967   8623       0        2          4
968   8633       0        2          6
969   8634       1        2          7
970   8640       1        1          5
971   8683       0        1          8
972   8685       1        1          6
973   8693       1        1          5
974   8697       0        1          6
975   8741       1        2          9
976   8869       1        1          9
977   8870       1        1          8
978   8880       1        1          6
979   8881       0        3          4
980   8889       1        2          8
981   8911       0        3          7
982   8916       0        2          8
983   8939       1        1          4
984   8948       0        2          8
985   8971       1        4          6
986   9005       1        1          8
987   9013       0        2          4
988   9066       1        1          6
989   9080       0        3          6
990   9107       1        2          6
991   9117       0        4          6
992   9134       0        3          8
993   9174       1        2          6
994   9191       0        1          6
995   9195       0        2          7
996   9197       0        1          8
997   9204       0        2          6
998   9214       0        3          6
999   9220       0        2          8
1000  9256       1        3          4
1001  9273       1        1          9
1002  9274       0        3          6
1003  9284       1        3          7
1004  9287       0        1          6
1005  9347       1        1          7
1006  9355       1        4          6
1007  9374       0        3          6
1008  9381       1        1          6
1009  9390       0        1          8
1010  9391       1        2          5
1011  9392       1        1          6
1012  9401       1        2          7
1013  9416       1        1          8
1014  9419       1        2          7
1015  9442       1        2          6
1016  9446       1        2          6
1017  9517       1        1          7
1018  9521       1        1          8
1019  9529       0        3          4
1020  9556       0        2          7
1021  9574       1        2          8
1022  9619       0        4          7
1023  9620       1        5          9
1024  9624       0        2          7
1025  9672       0        3          6
1026  9674       0        2          9
1027  9687       0        4          7
1028  9697       1        1          6
1029  9707       1        1          4
1030  9754       0        4          4
1031  9756       1        4          7
1032  9760       0        2          5
1033  9768       1        1          4
1034  9809       0        2          7
1035  9811       0        3          8
1036  9823       0        2          8
1037  9838       1        3          8
1038  9871       1        1          8
1039  9876       1        1          5
1040  9922       0        1          8
1041  9940       0        5          9
1042  9941       0        2          6
1043  9969       0        1          9
1044  9981       0        1          6
1045 10008       1        2          8
1046 10009       1        2          8
1047 10026       1        4          9
1048 10028       0        2          7
1049 10029       1        2          7
1050 10036       1        4          8
1051 10038       1        1          6
1052 10062       1        1          8
1053 10082       1        3          5
1054 10113       0        3          3
1055 10135       1        1          5
1056 10152       0        3          5
1057 10166       1        3          7
1058 10182       0        2          6
1059 10185       1        1          6
1060 10211       1        1         10
1061 10226       0        2          7
1062 10232       1        1          7
1063 10234       0        1         10
1064 10236       0        4          4
1065 10259       0        1         13
1066 10315       0        2          8
1067 10331       1        2          6
1068 10334       0        2          8
1069 10362       1        1          5
1070 10377       0        3          6
1071 10379       1        1          8
1072 10389       0        2          6
1073 10425       0        1          8
1074 10436       0        1          8
1075 10452       0        1          4
1076 10453       0        1          8
1077 10470       0        1          5
1078 10481       1        3          7
1079 10493       0        1          5
1080 10500       0        3          8
1081 10582       1        2          7
1082 10634       1        3          8
1083 10664       0        2         10
1084 10666       0        2          7
1085 10706       1        3          5
1086 10707       0        3          8
1087 10718       0        2          4
1088 10728       0        1          6
1089 10742       0        1          5
1090 10745       1        2          7
1091 10754       0        2          8
1092 10757       0        3          4
1093 10765       1        2          4
1094 10767       1        1          5
1095 10775       1        1          8
1096 10781       1        1          3
1097 10790       1        1          5
1098 10804       0        3          7
1099 10811       1        1          7
1100 10819       1        1          7
1101 10844       1        3          4
1102 10854       0        1          5
1103 10882       1        3          8
1104 10891       1        1          8
1105 10893       0        3          6
1106 10894       0        1          8
1107 10945       1        2          9
1108 11001       0        1          7
1109 11002       0        3          5
1110 11019       0        1          6
1111 11071       1        2          8
1112 11072       0        1          6
1113 11142       0        1          7
1114 11221       0        1          6
1115 11319       0        2          8
1116 11325       0        2          4
1117 11406       0        1          7
1118 11418       1        1          8
1119 11436       1        2          6
1120 11467       0        1          7
1121 11480       1        1          9
1122 11510       0        1          6
1123 11516       1        2          7
1124 11523       0        3          7
1125 11525       1        1          6
1126 11534       1        1          7
1127 11535       0        3          3
1128 11568       0        1          7
1129 11581       1        2          7
1130 11598       0        1          8
1131 11629       1        1          6
1132 11651       1        1          8
1133 11658       0        1          8
1134 11664       0        2          8
1135 11686       0        1          8
1136 11696       1        2          4
1137 11703       1        1          6
1138 11745       0        1          7
1139 11751       0        1          6
1140 11769       0        1          7
1141 11776       0        3          7
1142 11779       1        1          6
1143 11787       0        1          6
1144 11792       0        3          5
1145 11794       1        2          8
1146 11800       1        1          5
1147 11801       0        3          8
1148 11813       1        2          7
1149 11816       1        1          7
1150 11819       0        4          8
1151 11823       1        1          7
1152 11836       0        1          6
1153 11850       0        3          8
1154 11857       0        1          5
1155 11858       0        4          5
1156 11859       1        1          5
1157 11873       1        1          7
1158 11884       0        3          7
1159 11886       0        2          8
1160 11890       0        1          7
1161 11905       1        3          7
1162 11918       0        3          8
1163 11934       1        1          7
1164 11937       0        3          6
1165 11955       1        1          6
1166 11970       0        2          9
1167 11978       0        1          6
1168 11980       1        1          6
1169 11990       0        1          7
1170 12012       0        1          7
1171 12015       0        2          8
1172 12016       1        1          8
1173 12018       1        3          6
1174 12022       1        2          5
1175 12036       1        4          6
1176 12046       0        1          7
1177 12061       0        2          8
1178 12066       1        2          8
1179 12076       0        2          8
1180 12080       1        3          6
1181 12082       0        2          8
1182 12087       1        2          7
1183 12097       0        1          7
1184 12101       1        1          7
1185 12113       0        1          6
1186 12119       1        2          6
1187 12122       0        3          6
1188 12123       1        3          7
1189 12135       0        2          6
1190 12149       1        3          6
1191 12150       1        3          8
1192 12181       1        1          7
1193 12193       0        1          6
1194 12194       1        5          5
1195 12196       0        2          6
1196 12222       1        4          5
1197 12237       1        1          8
1198 12278       0        3          8
1199 12284       1        3          7
1200 12300       0        3          8
1201 12566       0        3          5
1202 12589       0        2          7
1203 12648       0        2          5
1204 12659       1        3          5
1205 12667       0        3          5

Different ways to do the same thing

There are usually multiple ways to achieve a task in R. Ideally we’d like solutions that are:

  • readable: If you share your code with someone, can they figure out what you’re doing?
  • reliable: Is this way always going to work, even if the data is slightly different?
  • safe: Is this way going to introduce errors into your code without you noticing?
  • fast: Is this an efficient way to do things, given all of the above?

We’ll focus on the tidyverse because I think it’s the optimal mix of those characteristics

tidyverse

The same people who make RStudio also are responsible for a set of packages called the tidyverse

tidyverse

  • install.packages("tidyverse") actually downloads more than a dozen packages1
  • library(tidyverse) loads:

ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, lubridate

This is by no means the only way to manage your data, but I find that a lot of the time, it’s the easiest and simplest way to get things done.

Exercise

Intro to dataframes

Creating variables

Two (of several) ways to take the (natural) log of income and store it in the dataframe:

nlsy$log_income <- log(nlsy$income)

OR

nlsy <- mutate(nlsy, 
               log_income = log(income))

Note

The second way may look longer now, but we’ll see later why it’s useful when we make lots of variables at once!

New function: Creating a new variable with mutate()

General format:

dataframe <- mutate(dataframe,
                    new_variable = function(old_variable))

We can do whatever we want to a variable to make a new one:

nlsy <- mutate(nlsy,
               new_id = id + 1)

Tip

mutate() is a function that acts on a dataframe, so when we use the assignment arrow, it’s to store the dataframe with the new variable back in the same place

Making variables in “Base R”

nlsy$region_cat <- factor(nlsy$region)
nlsy$income <- round(nlsy$income)
nlsy$age_bir_cent <- nlsy$age_bir - mean(nlsy$age_bir)
nlsy$index <- 1:nrow(nlsy)
nlsy$slp_wkdy_cat <- ifelse(nlsy$sleep_wkdy < 5, "little",
                            ifelse(nlsy$sleep_wkdy < 7, "some",
                                   ifelse(nlsy$sleep_wkdy < 9, "ideal",
                                          ifelse(nlsy$sleep_wkdy < 12, "lots", NA)
                                   )
                            )
)

Very quickly your code can get overrun with dollar signs (and parentheses, and arrows)

Cleaner way to make lots of new variables

nlsy <- mutate(nlsy, # dataset
    # new variables
    region_cat = factor(region, labels = c("Northeast", "North Central", "South", "West")), 
    income = round(income),
    age_bir_cent = age_bir - mean(age_bir),
    index = row_number() # a special function that gives the row number
    # could make as many as we want....
)

Tip

We can refer to variables within the same dataset (region, income, age_bir) without the $ notation

mutate() tips and tricks

You still need to store your dataset somewhere, so make sure to include the assignment arrow

  • Good practice to make new copies with different names as you go along
nlsy_w_cats <- mutate(nlsy, # dataset
               region_cat = factor(region),
               sex_cat = factor(sex),
               race_eth_cat = factor(race_eth))

nlsy_clean <- mutate(nlsy_w_cats, # dataset
                     region_cat = fct_recode(region_cat,
                                             "Northeast" = "1",
                                             "North Central" = "2",
                                             "South" = "3",
                                             "West" = "4"),
                     sex_cat = fct_relevel(sex_cat,
                                           "Female", "Male"))

mutate() tips and tricks

  • You can refer immediately to variables you just made:
nlsy_new <- mutate(nlsy,
                   age_bir_cent = age_bir - mean(age_bir),
                   age_bir_stand = age_bir_cent / sd(age_bir_cent)
)

Tip

“Chunk” your work on the same/similar variables so you can keep track of how a variable is derived.

Exercise

Making variables

Factor variables

When I downloaded the data originally, it was all numeric (“double”)

I already converted some variables into categorical (“factor”) variables (using the codebook)

  • factors have levels
  • the first level is the reference level when you include it in a regression

New function: count()

We can explore factor variables (and other types!) using count():

count(nlsy, glasses_cat)
# A tibble: 2 × 2
  glasses_cat                n
  <fct>                  <int>
1 Doesn't wear glasses     581
2 Wears glasses/contacts   624

Tip

Like mutate(), this function takes a dataframe as its first argument. The second argument is the variable you want to count.

Cross-tabulations

Actually, count() can take a whole series of variable names:

count(nlsy, glasses_cat, sex_cat)
# A tibble: 4 × 3
  glasses_cat            sex_cat     n
  <fct>                  <fct>   <int>
1 Doesn't wear glasses   Male      280
2 Doesn't wear glasses   Female    301
3 Wears glasses/contacts Male      221
4 Wears glasses/contacts Female    403

Note

If this isn’t in the format you want your cross-tab in, don’t worry – we’ll see other funtions that make better tables later. This output is handy though, because it’s a dataframe! (Actually, a tibble!)

New function: converting a variable with factor()

Again, two ways of doing the same thing:

nlsy$region_cat <- factor(nlsy$region)

OR

nlsy <- mutate(nlsy, 
               region_cat = factor(region))

The factor() function does nothing to the names of the values

nlsy <- mutate(nlsy, 
               region_cat = factor(region))
class(nlsy$region_cat)
[1] "factor"
levels(nlsy$region_cat)
[1] "1" "2" "3" "4"

Warning

The levels will be in numeric order, or alphabetical order if a character variable. This means that factor(c(1, 2, ..., 10)) will have a different ordering than factor(c("1", "2", ..., "10")).

We can assign names to the values

nlsy <- mutate(nlsy, 
               region_cat = factor(region, 
                                   levels = c(1, 2, 3, 4),
                                   labels = c("Northeast", 
                                   "North Central", "South", 
                                   "West")))

Warning

Make sure the order of the levels = and labels = arguments always match!

It’s always good practice to confirm everything looks right

count(nlsy, region_cat, region)
# A tibble: 4 × 3
  region_cat    region     n
  <fct>          <dbl> <int>
1 Northeast          1   206
2 North Central      2   333
3 South              3   411
4 West               4   255

Exercise

Intro to factors

My favorite R function: case_when()

I used to write endless strings of ifelse() statements

  • If A is TRUE, then B; if not, then if C is true, then D; if not, then if E is true, then F; if not, …
nlsy <- mutate(nlsy,
               ifelse(sleep_wkdy < 5,  "little", 
                      ifelse(sleep_wkdy < 7, "some", 
                             ifelse(sleep_wkdy < 9, "ideal", 
                                    ifelse(sleep_wkdy < 12, "lots", NA)))))

This can be extremely hard to follow!

case_when() syntax

  • Ask a question (i.e., something that will give TRUE or FALSE) on the left-hand side of a ~
  • sleep_wkdy < 5 ~
  • If TRUE, variable will take on value of whatever is on the right-hand side of the ~
  • ~ "little"
  • Proceeds in order … if TRUE, takes that value and stops
  • If you want some default value, you can end with .default = {something}, which every observation will get if everything else is FALSE
  • .default = NA is the default default

Logicals: answers to TRUE/FALSE questions

When we want to know if something is

  • equal: ==
  • not equal: !=
  • greater than or equal to: >=
  • less than or equal to: <=

We also can ask about multiple conditions with & (and) and | (or).

case_when() combines a lot of “if-else” statements

nlsy <- mutate(nlsy, slp_cat_wkdy = 
                 case_when(sleep_wkdy < 5 ~ "little",
                           sleep_wkdy < 7 ~ "some",
                           sleep_wkdy < 9 ~ "ideal",
                           sleep_wkdy < 12 ~ "lots",
                           .default = NA
                 )
)

count(nlsy, sleep_wkdy, slp_cat_wkdy)
# A tibble: 13 × 3
   sleep_wkdy slp_cat_wkdy     n
        <dbl> <chr>        <int>
 1          0 little           1
 2          2 little           4
 3          3 little          14
 4          4 little          48
 5          5 some           136
 6          6 some           326
 7          7 ideal          357
 8          8 ideal          269
 9          9 lots            32
10         10 lots            14
11         11 lots             1
12         12 <NA>             2
13         13 <NA>             1

case_when() example

nlsy <- mutate(nlsy, total_sleep = 
                 case_when(
                   sleep_wknd > 8 & sleep_wkdy > 8 ~ 1,
                   sleep_wknd + sleep_wkdy > 15 ~ 2,
                   sleep_wknd - sleep_wkdy > 3 ~ 3
                 )
)
  • Which value would someone with sleep_wknd = 8 and sleep_wkdy = 4 go?
  • What about someone with sleep_wknd = 11 and sleep_wkdy = 4?
  • What about someone with sleep_wknd = 7 and sleep_wkdy = 7?

Creating a factor variable from a character variable after using case_when()

nlsy <- mutate(nlsy, slp_chr_wkdy = 
                 case_when(
                   sleep_wkdy < 5 ~ "little",
                   sleep_wkdy < 7 ~ "some",
                   sleep_wkdy < 9 ~ "ideal",
                   sleep_wkdy < 12 ~ "lots"
                 ),
               slp_cat_wkdy = factor(slp_chr_wkdy)
)

What order will these levels be in?

Side note: another way to look at factors

In the next few slides, I’ll use the summary() function (rather than count()) to look at factors

  • It’s easier to fit the output on slides
  • However, it doesn’t show anything interesting for character variables so I usually prefer count(), which does
summary(nlsy$slp_chr_wkdy)
   Length     Class      Mode 
     1205 character character 
summary(nlsy$slp_cat_wkdy)
 ideal little   lots   some   NA's 
   626     67     47    462      3 

forcats package

  • Tries to make working with factors safe and convenient
  • Functions to make new levels, reorder levels, combine levels, etc.
  • All the functions start with fct_ so they’re easy to find using tab-complete!
  • Automatically loads with library(tidyverse)

Reorder factors

The fct_relevel() function allows us just to rewrite the names of the categories out in the order we want them (safely).

nlsy <- mutate(nlsy, 
               slp_cat_wkdy_ord = fct_relevel(slp_cat_wkdy, 
                                              "little", 
                                              "some", 
                                              "ideal", 
                                              "lots"
               )
)

summary(nlsy$slp_cat_wkdy_ord)
little   some  ideal   lots   NA's 
    67    462    626     47      3 

What if you misspell something?

nlsy <- mutate(nlsy, 
               slp_cat_wkdy_ord2 = fct_relevel(slp_cat_wkdy, 
                                               "little", 
                                               "soome", 
                                               "ideal", 
                                               "lots"))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `slp_cat_wkdy_ord2 = fct_relevel(slp_cat_wkdy, "little",
  "soome", "ideal", "lots")`.
Caused by warning:
! 1 unknown level in `f`: soome
summary(nlsy$slp_cat_wkdy_ord2)
little  ideal   lots   some   NA's 
    67    626     47    462      3 

You get a warning, and levels you didn’t mention are pushed to the end.

Recode a factor

nlsy <- mutate(nlsy, 
               region_cat2 = fct_recode(region_cat,
                                        "NE" = "Northeast",
                                        "NC" = "North Central",
                                        "S" = "South",
                                        "W" = "West"))
summary(nlsy$region_cat2)
 NE  NC   S   W 
206 333 411 255 

Other orders

How about from most people to least?

nlsy <- mutate(nlsy, region_cat = fct_infreq(region_cat))
summary(nlsy$region_cat)
        South North Central          West     Northeast 
          411           333           255           206 

Or the reverse of that?

nlsy <- mutate(nlsy, region_cat = fct_rev(region_cat))
summary(nlsy$region_cat)
    Northeast          West North Central         South 
          206           255           333           411 

Tip

This will be handy when running regressions and creating graphs.

Add levels

We have some missing values – let’s say we want to include them as a group in a table, figure, or regression.

nlsy <- mutate(nlsy, slp_cat_wkdy_out = 
                 fct_na_value_to_level(slp_cat_wkdy, level = "outlier"))
summary(nlsy$slp_cat_wkdy_out)
  ideal  little    lots    some outlier 
    626      67      47     462       3 

Remove levels

Or maybe we want to combine some levels that don’t have a lot of observations in them:

nlsy <- mutate(nlsy, slp_cat_wkdy_comb = 
                 fct_collapse(slp_cat_wkdy, 
                              "less" = c("little", "some"),
                              "more" = c("ideal", "lots")
)
)
summary(nlsy$slp_cat_wkdy_comb)
more less NA's 
 673  529    3 

Add and remove

Or we can have R choose which ones to combine based on how few observations they have:

nlsy <- mutate(nlsy, slp_cat_wkdy_lump = 
                 fct_lump(slp_cat_wkdy, n = 2))
summary(nlsy$slp_cat_wkdy_lump)
ideal  some Other  NA's 
  626   462   114     3 
  • Probably not a good idea for factors with an inherent order

There are 25 fct_ functions in the package. The sky’s the limit when it comes to manipulating your categorical variables in R!

I never remember all of them – the goal is not for you to either, but for you to be able to find what you need!

Exercise

Factor functions

Today’s summary

  • We learned about the tidyverse and how to install and load packages
  • We learned about the tibble and how to create new variables in a dataframe
  • We learned about factor variables and how to manipulate them

Today’s functions

  • install.packages("package"): install a package (once)
  • library(package): load a package (every time you want to use it)
  • c(value, value, value): concatenate values into a vector
  • mean(vector); sd(vector): calculate the mean and standard deviation of a vector
  • glimpse(dataframe): get a quick overview of a dataframe
  • summary(dataframe); summary(dataframe$variable): get a summary of a dataframe or single variable
  • mutate(dataframe, new_variable = function(old_variable)): create a new variable
  • factor(variable, labels = , levels = ): convert a variable to a factor
  • case_when(variable < value ~ "label", variable == value ~ "label"): create a new variable based on a series of conditions
  • fct_relevel(), fct_recode(), fct_infreq(), fct_rev(), fct_na_value_to_level(), fct_collapse(), fct_lump(), etc.: functions to manipulate factors (don’t worry about memorizing, look up when you need to!)