[1] 2
Day 1 of the RI workshop, Summer 2023
FSU
This part of the class will be a little different than the rest of the class, here’s what we’ll cover:
Go to class (unless you actually can’t)
Go to department functions
Fulfilling your GA and RA responsibilities
Take a moment and perform the following calculations:
#you can use the # symbol to leave comments in your code
## assigning 2 to the letter b
b <- 2
## adding b and 2
b + 2[1] 4
2+2 instead of assigning that value to b would be easier, this is just to demonstrate how objects work# understanding this is useful for understanding how data is structured
## strings cannot interact with numeric vectors
'Hello World' + 2Error in "Hello World" + 2: non-numeric argument to binary operator
Error in "3" + 2: non-numeric argument to binary operator
#the benefit of parse_number, is it will pull the number out of a string
parse_number('himothy316') + 2[1] 318
There are a few different types of vectors, they are:
numeric: contain only numeric values such as 1.1, 2, 100, etc.character: contain only strings like we went over in the previous sections, such as "Republican", "Democratic", "Independentfactor: these are ordered character strings, think a scale that goes "Very Liberal" to "Very Conservative" or "Complete Autocracy" to "Democracy"logical: contains only TRUE or FALSE values[1] "Republican" "Democrat" "Republican" "Independent"
\ to apply escapes, \n is a new line, and \t is a tab, for instances[1] "character"
p1
Democrat Other Republican
1 1 2
Numeric:
my_vector with the values 2,6,4,3,5,17.sum() function)sqrt()) of the elements of the vectorCharacter:
sex with the following elements: Male, Female, Male, Male, and Femaleideology with the following observations: Liberal, Moderate, Moderate, Conservative, and LiberalFactor:
sex and add levels. Assign the levels such that the order is Male then Female.NA| Operator | Syntax |
|---|---|
| “less than” | < |
| “less than or equal to” | <= |
| “exactly equal to” | == |
| “greater than or equal to” | >= |
| “greater than” | > |
| “not equal to” | != |
| “or” | | |
| “and” | & |
$ operator, which will give you that column as a vectormutate function to accomplish this#calling the data from above "his name"
his_name %>%
mutate(ideo_cat = c('Very Conservative',
'Moderate',
'Very Liberal')) # A tibble: 3 × 4
name ideo sex ideo_cat
<chr> <dbl> <chr> <chr>
1 John 1 Male Very Conservative
2 Jacob 4 Male Moderate
3 Jingleheimer Schmidt 7 Male Very Liberal
%>%, this is a pipe, can be called using cntrl (or command)+shift+mcase_when() with mutate() (think of it as a glorified if-then)his_name %>%
mutate(ideo_cat = case_when(ideo == 1 ~ 'Very Conservative',
ideo == 4 ~ 'Moderate',
ideo == 7 ~ 'Very Liberal')) -> his_name_2
his_name_2# A tibble: 3 × 4
name ideo sex ideo_cat
<chr> <dbl> <chr> <chr>
1 John 1 Male Very Conservative
2 Jacob 4 Male Moderate
3 Jingleheimer Schmidt 7 Male Very Liberal
# A tibble: 1 × 4
name ideo sex ideo_cat
<chr> <dbl> <chr> <chr>
1 Jingleheimer Schmidt 7 Male Very Liberal
clean_names() function from the janitor package to fix this
:: to call one function from a package# A tibble: 4 × 3
id_number ideology_numeric most_important_issue
<dbl> <dbl> <chr>
1 123 1 abortion
2 124 2 health care
3 125 4 guns
4 126 7 police
select() function with either the variable’s name or positionslice() functiontibble(country = c('country', 'USA', 'China', 'Germany'),
wars = c('wars', 2, 4, 5),
pres = c('pres', 1,0,1),
par = c('par', 'Congress', 'None', 'Parliament')) -> country
country# A tibble: 4 × 4
country wars pres par
<chr> <chr> <chr> <chr>
1 country wars pres par
2 USA 2 1 Congress
3 China 4 0 None
4 Germany 5 1 Parliament
# A tibble: 3 × 4
country wars pres par
<chr> <chr> <chr> <chr>
1 USA 2 1 Congress
2 China 4 0 None
3 Germany 5 1 Parliament
war and pres are character variables, when they need to be numericslice(), parse_character() and mutate() to clean the dataNote that running this code back to back will result in only the second version of `anes` remaining in your global environment.
setwd() function allows you to manually set your working directorNote the forward slashes and that the path is read in as a string, this is on windows
On the course website, under Day 1, download the Olympics data and do the following:
read_csv()) and baseR (read.csv()) version of the function, note the differencesselect() so the data only has the country, winter, and summer variables.rowwise() to sum in each row, I will put that on the board when everyone is readyfilter() to show the results for only one countryfilter() to remove one countrylibrary(tidyverse)
# 1. Reading in the data
read.csv('olympics.csv') -> olympics
# here I'm not saving the tidyverse loading in because I like how read.csv handles
# the variable names better
read_csv('olympics.csv')
# 2. keeping only the variables that we want
olympics %>%
select(X0, X1, X6) -> olympics
# 3. Renaming the varibales to something that makes sense
olympics %>%
rename('country' = X0,
'summer' = X1,
'winter' = X6) -> olympics
# 4. Making new variable for the total
olympics %>%
slice(-1) %>%
rowwise() %>%
mutate(total = sum(parse_number(summer), parse_number(winter))) -> olympics
# 5. Filtering for only Germany
olympics %>%
filter(country == 'Germany')
# 6. Filtering to remove Chile
olympics %>%
filter(country != 'Chile')