Introduction to R
Tip
Experiment! You are not going to break anything!
Tip
Try to solve a problem yourself first, classmate second, teaching team third
Did you make a good-faith effort to answer the question using the tools we’ve covered in class?
tidyverse
and the concept of packagesTip
Now, you can just quit and restart RStudio if something goes wrong! You can also go to Session -> Restart R to clear your session.
Always confirm you are closing your parentheses!
Tools -> Global Options -> Code -> Display -> Rainbow Parentheses
You can run…
.R
script.qmd
(Quarto) or .Rmd
(R Markdown) file
I like to have all code print to the console for consistency:
mean()
, lm()
, table()
, etc.base
, stats
, graphics
, etc.{ggplot2}
, {dplyr}
, {data.table}
, {survival}
, etc. 1“You only have to buy the book once, but you have to go get it out of the bookshelf every time you want to read it.”
Several days later…
install.packages
, packages are downloaded from CRAN (The Comprehensive R Archive Network)
Script vs. console, installing packages, and changing settings
<-
to store objects in the environmentI call this the “assignment arrow”
Now vals
holds those values
Warning
No assignment arrow means that the object will be printed to the console (and lost forever!)
We can retrieve those values by running just the name of the object
We can also perform operations on them using functions like mean()
If we want to keep the result of that operation, we need to use <-
again
We could also create a character vector:
Or a logical vector:
Note
We’ll see more options as we go along!
We created vectors with the c()
function (c
stands for concatenate)
We could also create a matrix of values with the matrix()
function:
The numbers in square brackets are indices, which we can use to pull out values:
We can pull out rows or columns from matrices:
Pre-class challenges
# A tibble: 1,205 × 15
id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth sex region
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 0 1 5 7 3 3 2 1
2 6 1 2 6 7 1 3 1 1
3 8 0 2 7 9 7 3 2 1
4 16 1 3 6 7 3 3 2 1
5 18 0 3 10 10 2 3 1 3
6 20 1 2 7 8 2 3 2 1
7 27 0 1 8 8 1 3 2 1
8 49 1 1 8 8 6 3 2 1
9 57 1 2 7 8 1 3 2 1
10 67 0 1 8 8 1 3 1 1
# ℹ 1,195 more rows
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
# glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>
glimpse()
We can get a quick overview of the data with the glimpse()
function:
Rows: 1,205
Columns: 15
$ id <dbl> 3, 6, 8, 16, 18, 20, 27, 49, 57, 67, 86, 96, 97, 98, 117,…
$ glasses <dbl> 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, …
$ eyesight <dbl> 1, 2, 2, 3, 3, 2, 1, 1, 2, 1, 3, 5, 1, 1, 1, 1, 3, 2, 3, …
$ sleep_wkdy <dbl> 5, 6, 7, 6, 10, 7, 8, 8, 7, 8, 8, 7, 7, 7, 8, 7, 7, 8, 8,…
$ sleep_wknd <dbl> 7, 7, 9, 7, 10, 8, 8, 8, 8, 8, 8, 7, 8, 7, 8, 7, 4, 8, 8,…
$ nsibs <dbl> 3, 1, 7, 3, 2, 2, 1, 6, 1, 1, 7, 2, 7, 2, 2, 4, 9, 2, 2, …
$ race_eth <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, …
$ sex <dbl> 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, …
$ region <dbl> 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ income <dbl> 22390, 35000, 7227, 48000, 4510, 50000, 20000, 23900, 232…
$ age_bir <dbl> 19, 30, 17, 31, 19, 30, 27, 24, 21, 36, 17, 19, 29, 30, 2…
$ eyesight_cat <fct> Excellent, Very Good, Very Good, Good, Good, Very Good, E…
$ glasses_cat <fct> Doesn't wear glasses, Wears glasses/contacts, Doesn't wea…
$ race_eth_cat <fct> "Non-Black, Non-Hispanic", "Non-Black, Non-Hispanic", "No…
$ sex_cat <fct> Female, Male, Female, Female, Male, Female, Female, Femal…
Note
Notice that I write a function name followed by parentheses to signal it is a function, and can take arguments within the parentheses
summary()
We can also get a summary of the data with the summary()
function:
id glasses eyesight sleep_wkdy
Min. : 3 Min. :0.0000 Min. :1.00 Min. : 0.000
1st Qu.: 2317 1st Qu.:0.0000 1st Qu.:1.00 1st Qu.: 6.000
Median : 4744 Median :1.0000 Median :2.00 Median : 7.000
Mean : 5229 Mean :0.5178 Mean :1.99 Mean : 6.643
3rd Qu.: 7937 3rd Qu.:1.0000 3rd Qu.:3.00 3rd Qu.: 8.000
Max. :12667 Max. :1.0000 Max. :5.00 Max. :13.000
sleep_wknd nsibs race_eth sex
Min. : 0.000 Min. : 0.000 Min. :1.000 Min. :1.000
1st Qu.: 6.000 1st Qu.: 2.000 1st Qu.:2.000 1st Qu.:1.000
Median : 7.000 Median : 3.000 Median :3.000 Median :2.000
Mean : 7.267 Mean : 3.937 Mean :2.395 Mean :1.584
3rd Qu.: 8.000 3rd Qu.: 5.000 3rd Qu.:3.000 3rd Qu.:2.000
Max. :14.000 Max. :16.000 Max. :3.000 Max. :2.000
region income age_bir eyesight_cat
Min. :1.000 Min. : 0 Min. :13.00 Excellent:474
1st Qu.:2.000 1st Qu.: 6000 1st Qu.:19.00 Very Good:385
Median :3.000 Median :11155 Median :22.00 Good :249
Mean :2.593 Mean :15289 Mean :23.45 Fair : 78
3rd Qu.:3.000 3rd Qu.:20000 3rd Qu.:27.00 Poor : 19
Max. :4.000 Max. :75001 Max. :52.00
glasses_cat race_eth_cat sex_cat
Doesn't wear glasses :581 Hispanic :211 Male :501
Wears glasses/contacts:624 Black :307 Female:704
Non-Black, Non-Hispanic:687
We can pull out data from dataframes using the “square bracket notation” we already saw:
# A tibble: 1 × 15
id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth sex region
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 8 0 2 7 9 7 3 2 1
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
# glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>
It’s much more useful to be able to pull out a variable by its name, though:
[1] Female Male Female Female Male Female Female Female Female Male
[11] Female Female Female Female Male Female Female Female Female Female
[21] Female Male Female Female Female Male Female Male Female Female
[31] Male Male Male Female Female Male Female Female Male Female
[41] Female Male Female Male Female Male Male Female Male Male
[51] Male Female Female Female Male Female Male Female Male Male
[61] Male Female Male Female Female Male Male Female Female Male
[71] Female Male Male Male Female Male Male Male Male Female
[81] Female Female Female Female Female Female Male Female Male Female
[91] Male Female Female Female Female Female Female Female Male Female
[101] Female Female Female Female Male Male Female Male Male Female
[111] Female Male Male Male Male Female Male Male Male Male
[121] Female Female Male Female Female Female Female Female Male Male
[131] Female Female Male Female Female Female Female Female Male Female
[141] Male Male Female Female Male Male Female Female Female Female
[151] Female Female Female Female Male Female Female Male Male Male
[161] Female Male Female Female Male Female Female Female Female Male
[171] Male Female Female Male Female Female Female Female Female Female
[181] Female Male Male Female Male Female Male Female Female Female
[191] Male Female Female Female Female Male Female Male Male Male
Levels: Male Female
We can also get a summary of a single variable:
class(nlsy$sex_cat)
(factor!)# A tibble: 1,205 × 15
id glasses eyesight sleep_wkdy sleep_wknd nsibs race_eth sex region
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 0 1 5 7 3 3 2 1
2 6 1 2 6 7 1 3 1 1
3 8 0 2 7 9 7 3 2 1
4 16 1 3 6 7 3 3 2 1
5 18 0 3 10 10 2 3 1 3
6 20 1 2 7 8 2 3 2 1
7 27 0 1 8 8 1 3 2 1
8 49 1 1 8 8 6 3 2 1
9 57 1 2 7 8 1 3 2 1
10 67 0 1 8 8 1 3 1 1
# ℹ 1,195 more rows
# ℹ 6 more variables: income <dbl>, age_bir <dbl>, eyesight_cat <fct>,
# glasses_cat <fct>, race_eth_cat <fct>, sex_cat <fct>
id glasses eyesight sleep_wkdy
1 3 0 1 5
2 6 1 2 6
3 8 0 2 7
4 16 1 3 6
5 18 0 3 10
6 20 1 2 7
7 27 0 1 8
8 49 1 1 8
9 57 1 2 7
10 67 0 1 8
11 86 0 3 8
12 96 1 5 7
13 97 1 1 7
14 98 0 1 7
15 117 0 1 8
16 137 0 1 7
17 172 0 3 7
18 179 1 2 8
19 186 1 3 8
20 200 1 3 8
21 205 0 4 7
22 218 1 2 6
23 227 0 2 8
24 237 0 5 7
25 242 0 1 8
26 243 0 1 7
27 244 1 2 7
28 247 0 4 7
29 250 0 1 6
30 256 1 2 6
31 259 0 2 6
32 274 1 3 6
33 281 1 2 7
34 290 0 2 7
35 297 1 3 8
36 317 1 2 7
37 333 0 2 6
38 335 0 4 6
39 337 0 3 6
40 343 0 3 3
41 354 1 2 6
42 357 0 1 7
43 369 1 1 7
44 377 0 1 5
45 382 1 4 6
46 392 0 1 6
47 398 0 1 5
48 400 1 1 7
49 409 1 3 7
50 410 0 1 6
51 422 1 2 7
52 423 1 1 6
53 437 0 1 5
54 442 0 3 7
55 443 0 2 7
56 444 1 1 8
57 457 0 1 7
58 458 1 1 7
59 466 0 2 7
60 481 1 1 8
61 482 1 1 7
62 503 0 2 7
63 513 1 3 5
64 550 1 2 6
65 552 1 2 7
66 553 0 1 6
67 555 0 1 4
68 557 1 1 7
69 582 1 1 8
70 583 1 2 5
71 619 1 1 6
72 620 0 1 7
73 625 1 1 7
74 631 0 3 6
75 632 1 2 6
76 644 0 1 5
77 647 0 1 7
78 649 1 3 6
79 653 0 2 6
80 664 1 2 6
81 692 0 3 4
82 704 1 1 7
83 706 1 1 7
84 708 1 1 8
85 712 1 4 6
86 720 0 2 6
87 731 0 3 7
88 739 1 1 8
89 742 0 1 7
90 752 1 1 8
91 753 1 2 6
92 755 1 1 5
93 761 1 1 5
94 775 1 2 5
95 792 1 1 7
96 801 1 1 4
97 812 1 1 7
98 820 0 1 7
99 825 1 2 7
100 836 1 4 5
101 848 1 3 5
102 855 1 2 7
103 856 1 1 6
104 862 1 1 8
105 881 0 2 6
106 888 0 1 7
107 889 0 4 9
108 890 1 2 7
109 891 0 2 8
110 896 1 2 8
111 914 1 1 7
112 916 1 1 7
113 919 0 4 8
114 924 1 1 7
115 931 1 3 6
116 932 1 2 5
117 947 0 1 7
118 960 0 1 7
119 984 1 2 7
120 985 0 2 7
121 992 0 1 7
122 995 1 2 7
123 1000 0 1 6
124 1009 0 3 7
125 1034 0 2 5
126 1039 1 2 8
127 1050 0 3 8
128 1056 0 3 7
129 1059 1 3 6
130 1060 0 1 7
131 1063 0 3 8
132 1065 0 3 8
133 1068 0 3 6
134 1070 0 4 6
135 1077 0 4 8
136 1079 1 3 6
137 1088 1 1 7
138 1101 0 1 7
139 1102 0 1 6
140 1111 1 3 8
141 1122 1 3 6
142 1142 1 2 7
143 1163 1 4 5
144 1166 1 3 5
145 1169 0 1 9
146 1185 0 1 7
147 1189 0 1 5
148 1199 0 2 6
149 1212 1 3 7
150 1225 1 3 6
151 1226 0 3 6
152 1227 0 3 8
153 1233 1 3 4
154 1238 1 2 6
155 1249 1 1 4
156 1250 0 2 8
157 1261 1 3 8
158 1272 0 3 6
159 1288 1 4 7
160 1291 0 3 8
161 1293 1 1 6
162 1311 0 1 7
163 1312 1 3 7
164 1327 1 1 6
165 1342 0 3 3
166 1347 1 1 6
167 1352 1 1 6
168 1359 0 1 6
169 1360 1 1 7
170 1368 1 1 7
171 1393 1 2 7
172 1394 0 1 6
173 1411 1 3 7
174 1413 0 1 6
175 1415 1 1 5
176 1451 1 1 8
177 1455 1 1 3
178 1465 1 1 5
179 1469 1 4 7
180 1470 1 2 7
181 1474 1 3 9
182 1479 1 3 6
183 1481 0 1 7
184 1484 0 2 7
185 1520 1 2 5
186 1524 1 2 4
187 1527 0 1 6
188 1539 0 2 9
189 1541 0 1 8
190 1546 0 2 6
191 1547 0 3 8
192 1548 1 3 7
193 1551 1 1 7
194 1552 1 2 8
195 1553 0 1 9
196 1554 1 2 8
197 1566 1 1 6
198 1569 1 1 7
199 1575 1 1 5
200 1577 1 2 8
201 1587 1 3 6
202 1593 1 2 6
203 1596 1 3 5
204 1599 1 4 8
205 1600 1 3 7
206 1603 0 2 6
207 1605 1 3 7
208 1609 1 1 7
209 1610 0 2 7
210 1616 1 1 7
211 1617 1 1 6
212 1622 1 2 8
213 1623 1 1 7
214 1638 1 1 8
215 1640 1 1 6
216 1641 1 1 6
217 1656 1 3 6
218 1674 0 4 6
219 1685 0 1 7
220 1714 1 3 7
221 1729 0 3 7
222 1730 1 2 6
223 1735 0 2 7
224 1740 1 3 7
225 1746 1 3 7
226 1747 0 2 8
227 1766 0 2 6
228 1770 1 3 6
229 1778 0 2 7
230 1785 0 2 6
231 1795 1 2 6
232 1804 1 2 7
233 1808 1 1 7
234 1824 0 2 6
235 1833 0 1 7
236 1847 1 1 8
237 1855 1 2 8
238 1862 0 1 5
239 1865 1 4 8
240 1872 0 2 6
241 1875 1 2 6
242 1878 1 3 7
243 1880 1 2 5
244 1885 1 3 6
245 1901 0 2 7
246 1906 0 2 8
247 1910 0 2 8
248 1912 1 1 7
249 1914 1 1 6
250 1918 0 3 7
251 1930 1 1 7
252 1947 1 1 7
253 1954 1 1 8
254 1961 1 2 6
255 1962 1 1 7
256 1965 0 2 8
257 1966 1 2 6
258 1980 0 2 8
259 1984 0 2 5
260 1990 1 1 7
261 1994 0 3 7
262 2002 0 3 6
263 2003 1 3 8
264 2025 1 2 4
265 2027 1 2 7
266 2030 0 2 8
267 2055 1 1 6
268 2056 0 1 7
269 2064 1 2 8
270 2070 0 2 7
271 2073 1 2 6
272 2075 1 2 6
273 2076 0 2 6
274 2083 0 1 6
275 2093 1 1 6
276 2094 1 2 5
277 2097 0 1 8
278 2100 1 1 6
279 2101 1 3 7
280 2102 1 2 7
281 2104 1 1 5
282 2119 1 2 5
283 2126 1 2 6
284 2131 1 3 5
285 2158 0 2 7
286 2159 1 3 7
287 2163 0 2 7
288 2168 1 1 4
289 2190 0 1 7
290 2191 1 1 7
291 2196 1 1 5
292 2203 0 3 7
293 2221 1 3 7
294 2222 0 3 5
295 2227 1 2 7
296 2231 0 3 5
297 2237 0 1 5
298 2247 0 1 8
299 2281 0 3 6
300 2313 0 2 8
301 2314 1 1 7
302 2317 1 3 5
303 2318 0 2 10
304 2329 1 3 5
305 2336 1 2 7
306 2340 0 5 7
307 2359 0 4 9
308 2375 0 2 7
309 2380 1 5 7
310 2394 0 3 5
311 2395 0 1 10
312 2398 0 3 7
313 2399 0 2 3
314 2404 0 2 9
315 2407 1 3 5
316 2415 0 1 4
317 2416 0 1 8
318 2418 1 1 8
319 2437 1 2 7
320 2442 1 2 7
321 2447 0 1 5
322 2449 0 2 6
323 2450 1 2 7
324 2459 1 2 5
325 2474 1 5 10
326 2478 1 1 7
327 2481 1 3 6
328 2483 0 2 5
329 2494 1 1 8
330 2523 0 1 7
331 2525 0 3 8
332 2535 0 2 7
333 2536 0 2 6
334 2539 0 1 8
335 2541 1 1 6
336 2544 1 3 8
337 2545 0 1 8
338 2549 1 3 6
339 2550 1 1 8
340 2551 1 1 5
341 2555 0 1 6
342 2565 1 1 6
343 2566 0 2 7
344 2568 1 2 6
345 2569 1 2 7
346 2573 0 1 7
347 2594 1 2 8
348 2599 1 2 4
349 2614 1 1 6
350 2616 1 1 6
351 2629 1 3 7
352 2634 0 1 10
353 2637 1 1 5
354 2640 0 3 6
355 2646 1 1 7
356 2663 0 4 6
357 2672 1 3 8
358 2674 0 3 8
359 2676 0 1 3
360 2679 1 1 7
361 2693 0 2 6
362 2698 1 1 6
363 2702 0 1 6
364 2703 1 2 8
365 2705 0 3 8
366 2724 0 1 6
367 2728 1 1 6
368 2729 1 4 5
369 2741 0 3 10
370 2742 0 2 7
371 2745 1 3 3
372 2746 1 1 8
373 2748 1 2 8
374 2770 0 2 6
375 2771 0 2 6
376 2779 0 4 7
377 2781 0 4 7
378 2795 0 2 5
379 2803 0 3 5
380 2809 0 1 8
381 2813 0 3 8
382 2817 0 4 8
383 2866 0 1 8
384 2877 1 3 7
385 2885 1 3 6
386 2896 1 2 7
387 2902 1 1 6
388 2908 1 1 7
389 2936 1 2 9
390 2941 0 1 7
391 2948 1 1 7
392 2949 0 1 8
393 2962 0 2 7
394 2965 0 2 8
395 2980 1 2 8
396 2984 1 2 7
397 2996 0 1 6
398 3036 1 2 5
399 3037 1 1 8
400 3041 1 2 8
401 3046 0 3 5
402 3051 1 4 8
403 3057 1 1 5
404 3069 1 2 8
405 3075 0 3 6
406 3079 0 2 7
407 3084 0 2 6
408 3095 1 2 6
409 3100 1 1 7
410 3102 1 1 8
411 3125 1 3 6
412 3138 0 2 7
413 3145 1 2 8
414 3152 1 3 5
415 3156 1 2 5
416 3159 0 2 6
417 3163 0 2 8
418 3168 1 3 6
419 3170 1 2 6
420 3173 0 2 7
421 3179 0 4 8
422 3194 0 1 7
423 3198 1 2 6
424 3224 1 2 6
425 3250 1 2 8
426 3255 0 1 7
427 3303 0 1 8
428 3309 1 1 6
429 3316 1 2 5
430 3324 1 2 6
431 3325 1 4 8
432 3338 1 1 6
433 3351 0 5 6
434 3355 1 1 7
435 3356 1 3 5
436 3369 1 4 7
437 3390 1 1 7
438 3391 1 1 6
439 3392 0 1 6
440 3413 1 1 7
441 3416 1 1 7
442 3423 0 2 7
443 3429 0 3 7
444 3433 1 2 7
445 3444 0 4 6
446 3447 1 2 6
447 3461 0 2 6
448 3475 1 2 8
449 3481 1 2 6
450 3494 1 3 6
451 3495 1 1 7
452 3502 1 3 6
453 3509 0 4 8
454 3515 1 1 7
455 3530 1 2 6
456 3533 0 1 7
457 3538 0 3 5
458 3547 1 2 6
459 3564 1 2 6
460 3567 1 1 6
461 3575 1 1 7
462 3576 0 3 9
463 3581 1 1 8
464 3584 1 3 3
465 3587 1 2 7
466 3589 0 2 7
467 3594 1 2 6
468 3597 0 2 7
469 3610 1 2 7
470 3617 0 1 6
471 3619 1 1 7
472 3628 1 1 7
473 3648 0 2 7
474 3651 1 1 7
475 3655 1 1 6
476 3659 0 3 8
477 3694 0 4 7
478 3700 0 3 6
479 3703 1 4 8
480 3704 0 4 11
481 3707 1 1 7
482 3713 1 2 6
483 3719 0 2 6
484 3721 0 2 7
485 3722 0 2 8
486 3723 0 4 8
487 3733 0 1 6
488 3736 1 2 6
489 3768 1 3 4
490 3769 1 1 8
491 3781 1 2 6
492 3787 0 2 8
493 3791 0 1 4
494 3793 0 1 8
495 3799 0 1 8
496 3804 1 1 7
497 3807 1 1 4
498 3817 1 1 7
499 3829 0 2 6
500 3839 1 1 5
501 3860 1 2 7
502 3866 0 3 6
503 3877 1 3 7
504 3889 0 2 7
505 3895 1 3 6
506 3901 0 2 8
507 3908 1 2 6
508 3954 0 1 6
509 3967 1 1 7
510 3970 1 2 6
511 3972 1 1 7
512 3985 1 1 5
513 3987 0 1 7
514 3988 0 3 5
515 3992 0 2 7
516 3995 1 2 5
517 4025 0 4 5
518 4041 0 2 7
519 4042 1 2 7
520 4046 1 5 6
521 4048 0 1 6
522 4049 1 1 4
523 4060 1 2 7
524 4061 0 1 7
525 4066 1 1 9
526 4070 1 4 6
527 4076 1 1 6
528 4091 0 2 8
529 4092 0 1 9
530 4121 1 2 4
531 4122 1 1 6
532 4168 1 1 9
533 4177 0 4 6
534 4208 0 2 7
535 4219 0 3 7
536 4220 0 1 4
537 4224 1 1 7
538 4226 1 3 5
539 4228 1 2 8
540 4232 0 3 7
541 4236 0 2 6
542 4238 1 1 7
543 4246 1 3 7
544 4262 0 4 6
545 4264 0 1 9
546 4270 0 5 8
547 4274 1 2 8
548 4278 1 3 7
549 4280 1 1 8
550 4290 1 2 8
551 4294 0 3 8
552 4304 0 2 7
553 4306 1 1 5
554 4313 0 1 7
555 4328 0 2 7
556 4335 1 2 8
557 4341 1 3 6
558 4345 0 3 6
559 4350 1 1 7
560 4368 1 4 7
561 4372 1 1 7
562 4379 1 2 10
563 4386 1 5 8
564 4398 1 1 6
565 4428 1 1 7
566 4450 1 4 8
567 4468 0 1 8
568 4469 0 3 7
569 4475 0 2 4
570 4476 0 2 6
571 4483 1 1 5
572 4490 0 1 7
573 4494 1 2 6
574 4496 1 1 6
575 4511 0 3 7
576 4513 0 2 7
577 4536 1 2 7
578 4538 0 3 8
579 4559 1 1 6
580 4569 0 2 5
581 4571 1 2 8
582 4578 0 3 7
583 4589 0 2 8
584 4590 0 3 8
585 4594 1 1 7
586 4607 1 2 6
587 4609 1 1 7
588 4625 1 1 7
589 4636 1 1 7
590 4650 0 2 7
591 4664 0 2 7
592 4668 1 2 6
593 4687 1 1 8
594 4691 1 1 6
595 4695 1 2 7
596 4720 0 1 6
597 4723 0 2 5
598 4724 0 4 5
599 4731 1 2 6
600 4735 0 3 7
601 4737 0 1 8
602 4738 0 1 5
603 4744 1 2 6
604 4750 0 1 6
605 4753 1 3 8
606 4754 1 2 8
607 4769 0 1 8
608 4772 1 3 8
609 4774 0 3 5
610 4780 1 1 6
611 4800 0 2 6
612 4806 1 5 6
613 4810 1 1 7
614 4815 1 2 8
615 4817 1 1 8
616 4828 1 3 7
617 4837 1 1 6
618 4846 1 2 5
619 4858 1 1 6
620 4873 1 2 8
621 4876 1 1 7
622 4879 1 1 8
623 4881 1 2 7
624 4883 0 1 7
625 4896 1 2 7
626 4901 0 3 8
627 4907 1 1 8
628 4914 0 2 6
629 4919 0 1 7
630 4921 0 1 8
631 4926 1 3 6
632 4934 1 1 6
633 4943 1 1 6
634 4945 0 2 8
635 4950 1 3 5
636 4973 0 2 5
637 4978 1 3 8
638 5007 0 2 6
639 5017 0 1 6
640 5026 0 1 7
641 5028 1 1 7
642 5029 1 2 6
643 5035 1 3 8
644 5048 1 1 6
645 5061 0 1 8
646 5088 0 3 0
647 5102 1 2 8
648 5108 0 2 6
649 5115 1 3 6
650 5125 1 3 7
651 5133 0 1 8
652 5141 0 1 8
653 5142 1 3 6
654 5145 0 2 6
655 5147 0 4 7
656 5149 0 4 6
657 5162 1 1 7
658 5170 1 1 7
659 5178 0 2 6
660 5186 0 2 5
661 5187 1 2 7
662 5198 1 1 7
663 5199 0 1 7
664 5212 0 1 7
665 5218 1 2 7
666 5226 1 1 6
667 5244 0 2 8
668 5251 0 4 9
669 5255 1 1 8
670 5268 1 2 8
671 5269 0 1 4
672 5271 1 2 6
673 5272 0 3 6
674 5289 0 1 8
675 5294 0 1 8
676 5295 1 2 7
677 5299 0 1 7
678 5315 0 1 7
679 5325 0 1 4
680 5326 0 2 7
681 5327 1 2 7
682 5328 1 1 5
683 5332 1 1 7
684 5337 0 1 7
685 5339 1 2 7
686 5352 1 2 7
687 5372 1 1 8
688 5373 0 2 7
689 5382 0 2 8
690 5389 0 2 7
691 5390 0 3 8
692 5397 0 2 7
693 5402 0 2 8
694 5405 0 1 7
695 5406 1 1 7
696 5416 0 2 5
697 5419 0 2 6
698 5421 1 3 6
699 5424 1 3 7
700 5428 1 3 6
701 5455 0 4 9
702 5487 1 2 6
703 5488 1 4 5
704 5489 0 1 6
705 5495 0 3 8
706 5496 0 1 5
707 5497 0 1 5
708 5508 0 1 6
709 5526 0 2 6
710 5528 1 2 7
711 5531 1 3 7
712 5555 0 4 6
713 5557 1 2 7
714 5573 1 2 7
715 5594 1 1 6
716 5599 0 1 7
717 5615 1 2 8
718 5618 0 1 7
719 5638 0 2 7
720 5649 1 1 7
721 5651 0 1 6
722 5666 0 1 8
723 5669 1 2 8
724 5675 0 3 8
725 5681 1 2 6
726 5684 0 4 5
727 5690 0 2 9
728 5691 0 1 6
729 5695 1 1 8
730 5699 1 1 7
731 5702 1 2 7
732 5705 1 1 7
733 5707 0 3 7
734 5708 1 2 8
735 5713 0 1 6
736 5723 0 1 8
737 5731 1 4 6
738 5735 1 1 6
739 5750 0 2 7
740 5754 0 2 9
741 5756 1 3 9
742 5776 0 1 8
743 5784 1 3 7
744 5790 1 1 8
745 5791 1 3 8
746 5793 0 1 5
747 5803 0 2 8
748 5813 0 3 7
749 5821 0 1 8
750 5827 1 3 6
751 5829 0 2 6
752 5839 1 2 7
753 5840 1 1 8
754 5849 1 3 5
755 5855 0 2 6
756 5864 1 3 8
757 5886 0 2 8
758 5887 0 1 7
759 5905 1 1 6
760 5913 0 1 12
761 5915 0 5 8
762 5927 0 3 8
763 5928 1 4 4
764 5935 1 2 7
765 5941 1 3 7
766 5953 1 2 7
767 5999 0 4 4
768 6019 1 1 5
769 6043 1 3 5
770 6053 1 3 6
771 6069 0 3 8
772 6072 0 3 5
773 6078 1 5 7
774 6089 1 1 10
775 6094 0 2 6
776 6112 0 1 8
777 6118 1 3 7
778 6121 1 2 10
779 6128 0 2 4
780 6156 0 3 8
781 6160 0 1 8
782 6171 1 3 8
783 6185 1 1 8
784 6201 0 1 6
785 6209 0 1 8
786 6220 0 2 6
787 6231 1 2 8
788 6234 0 1 10
789 6235 1 1 5
790 6236 0 1 6
791 6239 1 1 7
792 6252 0 1 5
793 6272 1 1 5
794 6278 1 1 7
795 6280 1 5 6
796 6291 0 2 6
797 6308 0 4 7
798 6317 1 3 8
799 6321 1 1 6
800 6338 1 2 5
801 6339 1 3 6
802 6345 0 3 4
803 6357 0 2 12
804 6377 1 2 6
805 6381 0 2 6
806 6387 0 1 6
807 6399 0 3 6
808 6415 0 2 8
809 6426 0 1 9
810 6449 1 2 5
811 6458 0 2 5
812 6485 0 2 6
813 6491 1 2 7
814 6574 0 2 8
815 6647 0 2 7
816 6693 1 1 7
817 6740 0 1 5
818 6769 0 1 7
819 6773 1 1 8
820 6776 0 1 6
821 6787 0 1 6
822 6789 1 1 8
823 6800 1 2 6
824 6803 0 1 7
825 6843 1 1 5
826 6852 0 1 6
827 6858 1 4 8
828 6862 1 1 4
829 6863 0 3 6
830 6867 0 3 8
831 6878 0 3 6
832 6913 0 4 4
833 6967 0 3 6
834 6970 0 1 2
835 6971 1 3 6
836 7049 1 1 6
837 7064 0 1 6
838 7066 1 3 4
839 7097 0 3 8
840 7099 0 3 6
841 7101 1 2 7
842 7102 1 2 6
843 7103 1 1 8
844 7113 0 1 3
845 7116 0 1 6
846 7124 1 3 6
847 7125 0 2 7
848 7137 1 3 7
849 7138 1 3 7
850 7140 0 1 3
851 7143 1 4 6
852 7148 1 3 7
853 7155 0 2 5
854 7179 1 2 8
855 7192 0 5 2
856 7249 0 3 8
857 7302 0 1 5
858 7313 0 1 6
859 7316 1 2 8
860 7340 1 3 6
861 7368 1 2 5
862 7411 0 1 3
863 7430 0 1 6
864 7437 0 1 8
865 7442 0 5 6
866 7470 1 1 8
867 7474 0 2 8
868 7493 0 2 2
869 7498 1 2 7
870 7501 0 4 2
871 7508 0 3 8
872 7509 0 3 5
873 7510 1 4 3
874 7552 0 1 5
875 7568 0 2 6
876 7569 0 1 4
877 7583 0 3 5
878 7598 0 2 6
879 7620 1 2 5
880 7622 0 1 6
881 7629 0 1 5
882 7636 1 2 6
883 7646 1 4 6
884 7685 1 2 9
885 7689 1 2 6
886 7691 0 1 8
887 7712 0 2 8
888 7748 0 2 7
889 7750 1 1 8
890 7754 1 1 8
891 7789 0 3 8
892 7794 0 2 7
893 7795 0 3 6
894 7799 0 4 10
895 7804 0 1 5
896 7805 0 3 8
897 7830 0 3 8
898 7841 1 4 6
899 7852 1 2 5
900 7867 1 1 6
901 7903 1 1 7
902 7909 0 1 7
903 7923 0 1 6
904 7937 0 1 8
905 7938 0 2 6
906 7943 1 2 8
907 7967 1 3 7
908 7968 0 3 7
909 7981 0 3 5
910 8000 0 2 7
911 8002 0 1 7
912 8009 1 2 7
913 8028 1 3 5
914 8041 0 4 4
915 8046 0 1 8
916 8047 1 1 4
917 8063 1 2 7
918 8103 1 1 7
919 8107 1 1 8
920 8120 1 2 8
921 8121 0 2 8
922 8123 1 1 6
923 8127 1 3 7
924 8139 0 2 5
925 8140 0 1 8
926 8141 1 2 6
927 8143 1 1 7
928 8151 0 2 6
929 8173 1 2 7
930 8177 0 3 5
931 8193 0 1 7
932 8197 1 4 8
933 8209 0 4 6
934 8216 1 1 5
935 8235 1 3 8
936 8237 1 1 7
937 8243 0 2 6
938 8257 1 1 8
939 8267 1 3 6
940 8271 1 5 6
941 8272 0 2 8
942 8273 1 2 8
943 8274 0 2 6
944 8276 1 1 4
945 8286 1 3 8
946 8287 1 1 6
947 8304 0 1 7
948 8305 0 4 5
949 8309 0 4 6
950 8327 0 1 8
951 8364 1 3 6
952 8377 0 1 4
953 8388 0 1 7
954 8391 1 3 8
955 8392 0 1 6
956 8395 1 1 6
957 8406 0 1 8
958 8407 0 2 9
959 8434 1 3 7
960 8479 1 3 7
961 8490 1 2 6
962 8493 0 4 6
963 8494 0 1 6
964 8495 0 2 7
965 8513 1 2 5
966 8549 0 2 5
967 8623 0 2 4
968 8633 0 2 6
969 8634 1 2 7
970 8640 1 1 5
971 8683 0 1 8
972 8685 1 1 6
973 8693 1 1 5
974 8697 0 1 6
975 8741 1 2 9
976 8869 1 1 9
977 8870 1 1 8
978 8880 1 1 6
979 8881 0 3 4
980 8889 1 2 8
981 8911 0 3 7
982 8916 0 2 8
983 8939 1 1 4
984 8948 0 2 8
985 8971 1 4 6
986 9005 1 1 8
987 9013 0 2 4
988 9066 1 1 6
989 9080 0 3 6
990 9107 1 2 6
991 9117 0 4 6
992 9134 0 3 8
993 9174 1 2 6
994 9191 0 1 6
995 9195 0 2 7
996 9197 0 1 8
997 9204 0 2 6
998 9214 0 3 6
999 9220 0 2 8
1000 9256 1 3 4
1001 9273 1 1 9
1002 9274 0 3 6
1003 9284 1 3 7
1004 9287 0 1 6
1005 9347 1 1 7
1006 9355 1 4 6
1007 9374 0 3 6
1008 9381 1 1 6
1009 9390 0 1 8
1010 9391 1 2 5
1011 9392 1 1 6
1012 9401 1 2 7
1013 9416 1 1 8
1014 9419 1 2 7
1015 9442 1 2 6
1016 9446 1 2 6
1017 9517 1 1 7
1018 9521 1 1 8
1019 9529 0 3 4
1020 9556 0 2 7
1021 9574 1 2 8
1022 9619 0 4 7
1023 9620 1 5 9
1024 9624 0 2 7
1025 9672 0 3 6
1026 9674 0 2 9
1027 9687 0 4 7
1028 9697 1 1 6
1029 9707 1 1 4
1030 9754 0 4 4
1031 9756 1 4 7
1032 9760 0 2 5
1033 9768 1 1 4
1034 9809 0 2 7
1035 9811 0 3 8
1036 9823 0 2 8
1037 9838 1 3 8
1038 9871 1 1 8
1039 9876 1 1 5
1040 9922 0 1 8
1041 9940 0 5 9
1042 9941 0 2 6
1043 9969 0 1 9
1044 9981 0 1 6
1045 10008 1 2 8
1046 10009 1 2 8
1047 10026 1 4 9
1048 10028 0 2 7
1049 10029 1 2 7
1050 10036 1 4 8
1051 10038 1 1 6
1052 10062 1 1 8
1053 10082 1 3 5
1054 10113 0 3 3
1055 10135 1 1 5
1056 10152 0 3 5
1057 10166 1 3 7
1058 10182 0 2 6
1059 10185 1 1 6
1060 10211 1 1 10
1061 10226 0 2 7
1062 10232 1 1 7
1063 10234 0 1 10
1064 10236 0 4 4
1065 10259 0 1 13
1066 10315 0 2 8
1067 10331 1 2 6
1068 10334 0 2 8
1069 10362 1 1 5
1070 10377 0 3 6
1071 10379 1 1 8
1072 10389 0 2 6
1073 10425 0 1 8
1074 10436 0 1 8
1075 10452 0 1 4
1076 10453 0 1 8
1077 10470 0 1 5
1078 10481 1 3 7
1079 10493 0 1 5
1080 10500 0 3 8
1081 10582 1 2 7
1082 10634 1 3 8
1083 10664 0 2 10
1084 10666 0 2 7
1085 10706 1 3 5
1086 10707 0 3 8
1087 10718 0 2 4
1088 10728 0 1 6
1089 10742 0 1 5
1090 10745 1 2 7
1091 10754 0 2 8
1092 10757 0 3 4
1093 10765 1 2 4
1094 10767 1 1 5
1095 10775 1 1 8
1096 10781 1 1 3
1097 10790 1 1 5
1098 10804 0 3 7
1099 10811 1 1 7
1100 10819 1 1 7
1101 10844 1 3 4
1102 10854 0 1 5
1103 10882 1 3 8
1104 10891 1 1 8
1105 10893 0 3 6
1106 10894 0 1 8
1107 10945 1 2 9
1108 11001 0 1 7
1109 11002 0 3 5
1110 11019 0 1 6
1111 11071 1 2 8
1112 11072 0 1 6
1113 11142 0 1 7
1114 11221 0 1 6
1115 11319 0 2 8
1116 11325 0 2 4
1117 11406 0 1 7
1118 11418 1 1 8
1119 11436 1 2 6
1120 11467 0 1 7
1121 11480 1 1 9
1122 11510 0 1 6
1123 11516 1 2 7
1124 11523 0 3 7
1125 11525 1 1 6
1126 11534 1 1 7
1127 11535 0 3 3
1128 11568 0 1 7
1129 11581 1 2 7
1130 11598 0 1 8
1131 11629 1 1 6
1132 11651 1 1 8
1133 11658 0 1 8
1134 11664 0 2 8
1135 11686 0 1 8
1136 11696 1 2 4
1137 11703 1 1 6
1138 11745 0 1 7
1139 11751 0 1 6
1140 11769 0 1 7
1141 11776 0 3 7
1142 11779 1 1 6
1143 11787 0 1 6
1144 11792 0 3 5
1145 11794 1 2 8
1146 11800 1 1 5
1147 11801 0 3 8
1148 11813 1 2 7
1149 11816 1 1 7
1150 11819 0 4 8
1151 11823 1 1 7
1152 11836 0 1 6
1153 11850 0 3 8
1154 11857 0 1 5
1155 11858 0 4 5
1156 11859 1 1 5
1157 11873 1 1 7
1158 11884 0 3 7
1159 11886 0 2 8
1160 11890 0 1 7
1161 11905 1 3 7
1162 11918 0 3 8
1163 11934 1 1 7
1164 11937 0 3 6
1165 11955 1 1 6
1166 11970 0 2 9
1167 11978 0 1 6
1168 11980 1 1 6
1169 11990 0 1 7
1170 12012 0 1 7
1171 12015 0 2 8
1172 12016 1 1 8
1173 12018 1 3 6
1174 12022 1 2 5
1175 12036 1 4 6
1176 12046 0 1 7
1177 12061 0 2 8
1178 12066 1 2 8
1179 12076 0 2 8
1180 12080 1 3 6
1181 12082 0 2 8
1182 12087 1 2 7
1183 12097 0 1 7
1184 12101 1 1 7
1185 12113 0 1 6
1186 12119 1 2 6
1187 12122 0 3 6
1188 12123 1 3 7
1189 12135 0 2 6
1190 12149 1 3 6
1191 12150 1 3 8
1192 12181 1 1 7
1193 12193 0 1 6
1194 12194 1 5 5
1195 12196 0 2 6
1196 12222 1 4 5
1197 12237 1 1 8
1198 12278 0 3 8
1199 12284 1 3 7
1200 12300 0 3 8
1201 12566 0 3 5
1202 12589 0 2 7
1203 12648 0 2 5
1204 12659 1 3 5
1205 12667 0 3 5
There are usually multiple ways to achieve a task in R. Ideally we’d like solutions that are:
We’ll focus on the tidyverse because I think it’s the optimal mix of those characteristics
The same people who make RStudio also are responsible for a set of packages called the tidyverse
install.packages("tidyverse")
actually downloads more than a dozen packages1library(tidyverse)
loads:ggplot2
, dplyr
, tidyr
, readr
, purrr
, tibble
, stringr
, forcats
, lubridate
This is by no means the only way to manage your data, but I find that a lot of the time, it’s the easiest and simplest way to get things done.
Intro to dataframes
Two (of several) ways to take the (natural) log of income and store it in the dataframe:
OR
Note
The second way may look longer now, but we’ll see later why it’s useful when we make lots of variables at once!
mutate()
General format:
We can do whatever we want to a variable to make a new one:
Tip
mutate()
is a function that acts on a dataframe, so when we use the assignment arrow, it’s to store the dataframe with the new variable back in the same place
nlsy$region_cat <- factor(nlsy$region)
nlsy$income <- round(nlsy$income)
nlsy$age_bir_cent <- nlsy$age_bir - mean(nlsy$age_bir)
nlsy$index <- 1:nrow(nlsy)
nlsy$slp_wkdy_cat <- ifelse(nlsy$sleep_wkdy < 5, "little",
ifelse(nlsy$sleep_wkdy < 7, "some",
ifelse(nlsy$sleep_wkdy < 9, "ideal",
ifelse(nlsy$sleep_wkdy < 12, "lots", NA)
)
)
)
nlsy <- mutate(nlsy, # dataset
# new variables
region_cat = factor(region, labels = c("Northeast", "North Central", "South", "West")),
income = round(income),
age_bir_cent = age_bir - mean(age_bir),
index = row_number() # a special function that gives the row number
# could make as many as we want....
)
Tip
We can refer to variables within the same dataset (region
, income
, age_bir
) without the $
notation
mutate()
tips and tricksYou still need to store your dataset somewhere, so make sure to include the assignment arrow
nlsy_w_cats <- mutate(nlsy, # dataset
region_cat = factor(region),
sex_cat = factor(sex),
race_eth_cat = factor(race_eth))
nlsy_clean <- mutate(nlsy_w_cats, # dataset
region_cat = fct_recode(region_cat,
"Northeast" = "1",
"North Central" = "2",
"South" = "3",
"West" = "4"),
sex_cat = fct_relevel(sex_cat,
"Female", "Male"))
mutate()
tips and tricksTip
“Chunk” your work on the same/similar variables so you can keep track of how a variable is derived.
Making variables
When I downloaded the data originally, it was all numeric (“double”)
I already converted some variables into categorical (“factor”) variables (using the codebook)
count()
We can explore factor variables (and other types!) using count()
:
# A tibble: 2 × 2
glasses_cat n
<fct> <int>
1 Doesn't wear glasses 581
2 Wears glasses/contacts 624
Tip
Like mutate()
, this function takes a dataframe as its first argument. The second argument is the variable you want to count.
Actually, count()
can take a whole series of variable names:
# A tibble: 4 × 3
glasses_cat sex_cat n
<fct> <fct> <int>
1 Doesn't wear glasses Male 280
2 Doesn't wear glasses Female 301
3 Wears glasses/contacts Male 221
4 Wears glasses/contacts Female 403
Note
If this isn’t in the format you want your cross-tab in, don’t worry – we’ll see other funtions that make better tables later. This output is handy though, because it’s a dataframe! (Actually, a tibble!)
factor()
Again, two ways of doing the same thing:
OR
factor()
function does nothing to the names of the values[1] "factor"
[1] "1" "2" "3" "4"
Warning
The levels will be in numeric order, or alphabetical order if a character variable. This means that factor(c(1, 2, ..., 10))
will have a different ordering than factor(c("1", "2", ..., "10"))
.
Warning
Make sure the order of the levels =
and labels =
arguments always match!
Intro to factors
case_when()
I used to write endless strings of ifelse()
statements
This can be extremely hard to follow!
case_when()
syntaxTRUE
or FALSE
) on the left-hand side of a ~
sleep_wkdy < 5 ~
TRUE
, variable will take on value of whatever is on the right-hand side of the ~
~ "little"
.default = {something}
, which every observation will get if everything else is FALSE
.default = NA
is the default defaultWhen we want to know if something is
==
!=
>=
<=
We also can ask about multiple conditions with &
(and) and |
(or).
case_when()
combines a lot of “if-else” statementsnlsy <- mutate(nlsy, slp_cat_wkdy =
case_when(sleep_wkdy < 5 ~ "little",
sleep_wkdy < 7 ~ "some",
sleep_wkdy < 9 ~ "ideal",
sleep_wkdy < 12 ~ "lots",
.default = NA
)
)
count(nlsy, sleep_wkdy, slp_cat_wkdy)
# A tibble: 13 × 3
sleep_wkdy slp_cat_wkdy n
<dbl> <chr> <int>
1 0 little 1
2 2 little 4
3 3 little 14
4 4 little 48
5 5 some 136
6 6 some 326
7 7 ideal 357
8 8 ideal 269
9 9 lots 32
10 10 lots 14
11 11 lots 1
12 12 <NA> 2
13 13 <NA> 1
case_when()
examplesleep_wknd = 8
and sleep_wkdy = 4
go?sleep_wknd = 11
and sleep_wkdy = 4
?sleep_wknd = 7
and sleep_wkdy = 7
?case_when()
What order will these levels be in?
In the next few slides, I’ll use the summary()
function (rather than count()
) to look at factors
count()
, which doesforcats
packagefct_
so they’re easy to find using tab-complete!library(tidyverse)
The fct_relevel()
function allows us just to rewrite the names of the categories out in the order we want them (safely).
nlsy <- mutate(nlsy,
slp_cat_wkdy_ord2 = fct_relevel(slp_cat_wkdy,
"little",
"soome",
"ideal",
"lots"))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `slp_cat_wkdy_ord2 = fct_relevel(slp_cat_wkdy, "little",
"soome", "ideal", "lots")`.
Caused by warning:
! 1 unknown level in `f`: soome
little ideal lots some NA's
67 626 47 462 3
How about from most people to least?
South North Central West Northeast
411 333 255 206
Or the reverse of that?
Northeast West North Central South
206 255 333 411
Tip
This will be handy when running regressions and creating graphs.
We have some missing values – let’s say we want to include them as a group in a table, figure, or regression.
Or maybe we want to combine some levels that don’t have a lot of observations in them:
Or we can have R choose which ones to combine based on how few observations they have:
nlsy <- mutate(nlsy, slp_cat_wkdy_lump =
fct_lump(slp_cat_wkdy, n = 2))
summary(nlsy$slp_cat_wkdy_lump)
ideal some Other NA's
626 462 114 3
fct_
functions in the package. The sky’s the limit when it comes to manipulating your categorical variables in R!I never remember all of them – the goal is not for you to either, but for you to be able to find what you need!
Factor functions
tidyverse
and how to install and load packagestibble
and how to create new variables in a dataframeinstall.packages("package")
: install a package (once)library(package)
: load a package (every time you want to use it)c(value, value, value)
: concatenate values into a vectormean(vector)
; sd(vector)
: calculate the mean and standard deviation of a vectorglimpse(dataframe)
: get a quick overview of a dataframesummary(dataframe)
; summary(dataframe$variable)
: get a summary of a dataframe or single variablemutate(dataframe, new_variable = function(old_variable))
: create a new variablefactor(variable, labels = , levels = )
: convert a variable to a factorcase_when(variable < value ~ "label", variable == value ~ "label")
: create a new variable based on a series of conditionsfct_relevel()
, fct_recode()
, fct_infreq()
, fct_rev()
, fct_na_value_to_level()
, fct_collapse()
, fct_lump()
, etc.: functions to manipulate factors (don’t worry about memorizing, look up when you need to!)