Show Hint
# Make sure to include the TRUE ~ x
#this will make sure that in all other instances the
#variable remains the same
data %>%
mutate(x = case_when(x < 0 ~ NA_real,
TRUE ~ x))Take this data and clean it in the following ways (Note that these steps do not need to be done in order as long as they are all done):
Make sure the variable names are correct and that the variables are formatted properly (i.e., numeric variables are read in as numeric and character variables are read in as characters) [hint: look at the documentation for the function you’re using to read in the data by going to the console and entering ?read_csv]
Using the select() function, make a new dataframe that has only the following variables: Age, Gender, Income, Approval Rating, Political_Party, Opinion Text, Gun Control, and Abortion.
At least one variable has negative numbers where they should instead be read as NAs, find which variable it is and replace those negative values.
Hint: this involves using case_when() and mutate() which we discussed in class, here is example code to help get you started:
# Make sure to include the TRUE ~ x
#this will make sure that in all other instances the
#variable remains the same
data %>%
mutate(x = case_when(x < 0 ~ NA_real,
TRUE ~ x))Leave only Republicans in the data (hint: remember the filter() function)
Upload your cleaned data to here along with your code (hint: use the write_csv() function). Name the file "lastname_hw1_data.csv".
library(tidyverse)
data %>%
mutate(doing some data cleaning here) %>%
select(pull the variables I want) %>%
filter(pull the observations I want) -> data
write_csv(data, 'path/where/you/want/it/save/lastname_hw1_data.csv')Download one of the datasets from the Data page and load it into R. It can be any dataset you want, I tried to include enough that all of your interests would be represented. If you have a different set of data you’d like to use, shoot me an email.
Make a tidy dataframe, i.e., take the data you loaded into R and do the following:
Take some time and practice using group_by() and summarize()
n() function from class)#here's roughly what your code should look like for each one
data %>%
group_by(categorical_variable) %>%
summarize(average = mean(continuous_variable))Upload your HW here
Based on what we’ve learned in class today, try to make a graph! Anything will do.1 The best one will get printed out and posted on my office door.
Submit your graphs here!
Anything except for a pie chart↩︎