Scraping Data for 2014 Gubernatorial Elections

Scraping 2018 Election Data
November 7, 2018
Show all
Scraping Data for 2014 Gubernatorial Elections

Scraping Data for 2014 Gubernatorial Elections

People are excited about the elections, me too! I have decided to play with election data and I plan to create some plots meaningful to me to make comparisons with the previous years. I have found this webpage for the MIT Election Data and Science Lab that did a tremendous job of merging election data from 2012 and 2016 presidential elections and 2016 state elections (Senate, House, and Governor). I was thrilled when I found it because they saved a lot of time for me. However, I realized it was missing data from 2014 gubernatorial elections. Based on a quick search on the web, I found that 12 states with a gubernatorial election in 2016 included in their datasets while 36 states with a gubernatorial election in 2014 missing. For the sake of comparison between the 2014 and 2018 gubernatorial elections, I think it is important to have 2014 data.

There may be official sources for the 2014 gubernatorial election, but I don't have time to dig deeper. So, the fastest source I could access was the New York Times webpage for the 2014 elections. They have nice interactive plots at the county and state level, but the raw data is hidden somewhere. After some detective work in their source page, I found the page was pulling data from the JSON file at this link.

The rest was a little bit of R programming. I don't have much time to explain what it does, but I put the code below. For those interested, they can dig into the code deeper. For those who are not interested in the coding piece, here is the CSV file you can download for 2014 gubernatorial elections at the state and county level. I structured data exactly same as in the MIT Election Data and Science Lab page, so both can be merged into one file.


require(jsonlite)
require(Hmisc)
require(rmarkdown)
require(knitr)

# Read the JSON file from the New York Times link

a = fromJSON("https://int.nyt.com/applications/elections/2014/data/2014-11-04/supermap/governor.json")

# For all states except Alaska, restructure the data in a 5 column dataframe

state.vote <- vector("list",35)

for(i in 3:37) {

    info = a[[i]][[3]]
    rep.name = info[which(info$party_id=="REP"),]$slug
    dem.name = info[which(info$party_id=="DEM"),]$slug

  if(nrow(info) > 3) { 

        others   = info$slug[! info$slug %in% c(rep.name,dem.name)]
        state = a[[i]][[2]][,c("name",rep.name,dem.name,others)]
        state$othergov14 <- rowSums(state[,4:ncol(state)])
        state <- state[,c(1,2,3,ncol(state))]
        state <- cbind(capitalize(a[[i]]$state$state_slug),state)
        colnames(state) <- c("state","county","repgov14","demgov14","othergov14")

    } 

  if(nrow(info) == 3 ) { 

        others   = info$slug[! info$slug %in% c(rep.name,dem.name)]
        state = a[[i]][[2]][,c("name",rep.name,dem.name,others)]
        state <- cbind(capitalize(a[[i]]$state$state_slug),state)
        colnames(state) <- c("state","county","repgov14","demgov14","othergov14")

    } 

    if(nrow(info) == 2) {

        state = a[[i]][[2]][,c("name",rep.name,dem.name)]
        state$othergov14 <- 0
        state <- state[,c(1,2,3,ncol(state))]
        state <- cbind(capitalize(a[[i]]$state$state_slug),state)
        colnames(state) <- c("state","county","repgov14","demgov14","othergov14")


    }

      state.vote[[i-2]] = state
}

# Combine all in one data frame

electiongov2014 <- state.vote[[1]]

for(i in 2:35) {
 electiongov2014 <- rbind(electiongov2014,state.vote[[i]])
}


#######################

dim(electiongov2014)

[1] 2149    5

str(electiongov2014)

'data.frame':   2149 obs. of  5 variables:
 $ state     : Factor w/ 35 levels "Alabama","Arkansas",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ county    : chr  "Autauga" "Baldwin" "Barbour" "Bibb" ...
 $ repgov14  : int  9427 37650 3111 3525 12074 747 3148 17688 3635 5007 ...
 $ demgov14  : int  3638 8416 3651 1368 2178 2440 2741 9082 4587 1868 ...
 $ othergov14: num  0 0 0 0 0 0 0 0 0 0 ...