Homework

Day 1

Take this data and clean it in the following ways (Note that these steps do not need to be done in order as long as they are all done):

Make sure the variable names are correct and that the variables are formatted properly (i.e., numeric variables are read in as numeric and character variables are read in as characters) [hint: look at the documentation for the function you’re using to read in the data by going to the console and entering ?read_csv]
Using the select() function, make a new dataframe that has only the following variables: Age, Gender, Income, Approval Rating, Political_Party, Opinion Text, Gun Control, and Abortion.
At least one variable has negative numbers where they should instead be read as NAs, find which variable it is and replace those negative values.
- Hint: this involves using case_when() and mutate() which we discussed in class, here is example code to help get you started:
  Show Hint
  # Make sure to include the TRUE ~ x #this will make sure that in all other instances the #variable remains the same data %>% mutate(x = case_when(x < 0 ~ NA_real, TRUE ~ x))
Leave only Republicans in the data (hint: remember the filter() function)
Upload your cleaned data to here along with your code (hint: use the write_csv() function). Name the file "lastname_hw1_data.csv".

Show Hint

library(tidyverse)

data %>% 
  mutate(doing some data cleaning here) %>% 
  select(pull the variables I want) %>% 
  filter(pull the observations I want) -> data

write_csv(data, 'path/where/you/want/it/save/lastname_hw1_data.csv')

Day 2

Download one of the datasets from the Data page and load it into R. It can be any dataset you want, I tried to include enough that all of your interests would be represented. If you have a different set of data you’d like to use, shoot me an email.
Make a tidy dataframe, i.e., take the data you loaded into R and do the following:
1. Pull out only the variables that you are interested in. This doesn’t necessarily mean only the variables you use in your analysis, it can include variables that don’t directly make their way into a figure or analysis but are used to identify observations.
2. Find what values are NAs in the data according to the codebook and remove those NAs.
Take some time and practice using group_by() and summarize()
1. Identify 2 catecorical variables of interest
2. Calculate the group size (remember the n() function from class)
3. Calculate the group mean of 2 continuous variables in the data

Show Hint

#here's roughly what your code should look like for each one
data %>% 
  group_by(categorical_variable) %>% 
  summarize(average = mean(continuous_variable))

Get the number of rows and columns that are in the cleaned version of your data.
Report summary statistics for continuous variables in your data.

Upload your HW here

Day 3

Based on what we’ve learned in class today, try to make a graph! Anything will do.¹ The best one will get printed out and posted on my office door.

Submit your graphs here!

Footnotes

Anything except for a pie chart↩︎