Creating a function

With the last exercise as introduction, believe it or not, we know everything we need to know to accomplish the automation task we set out to do. We already have the bulk of the code that the function relies on, so it's often a matter of pasting it into the body of the function and making some minor changes. To write good functions, we often begin by writing code that works, then we identify the need for automation (to reuse code, to automatically clean intermediate results), and finally we wrap the code around a function and modify and test it to make sure it still works.

Of course writing good functions can be more involved than what we described here. This is especially so when we write functions that we intend to use across multiple projects or share with others. In such cases, we often spend more time anticipating all the ways that the function could break given different inputs and try to account for such cases.

With the last exercise as a backdrop, let's now delete pickup_nhood and dropoff_nhood from the data and recreate those columns, this time by writing a function.

nyc_taxi$pickup_nhood <- NULL # we drop this column so we can re-create it here
nyc_taxi$dropoff_nhood <- NULL # we drop this column so we can re-create it here

We call the function add.neighborhoods. Its inputs are the dataset, the names of the longitude and latitude coordinates (as strings), and the shapefile. The output we return is a single column containing the neighborhoods names.

add.neighborhoods <- function(long_var, lat_var, shapefile) {
    require(regeos)
    require(maptools)
    data_coords <- data.frame(long = ifelse(is.na(long_var), 0, long_var), 
                              lat = ifelse(is.na(lat_var), 0, lat_var)) # create `data.frame` with only those two columns
    coordinates(data_coords) <- c('long', 'lat') # designate the columns as geographical coordinates
    nhoods <- over(data_coords, shapefile) # find the neighborhoods the coordinates fall into
    nhoods$NAME <- factor(nhoods$NAME, levels = as.character(shapefile@data$NAME)) # reset factor levels to Manhattan only
    return(nhoods$NAME) # return only the column with the neighborhoods
}

We can now use our function twice. Once to find the pick-up neighborhood:

nyc_taxi$pickup_nhood <- add.neighborhoods(nyc_taxi$pickup_longitude, nyc_taxi$pickup_latitude, nyc_shapefile)
table(nyc_taxi$pickup_nhood, useNA = "ifany")

       West Village        East Village        Battery Park       Carnegie Hill 
              94222              135597               33783               43896 
           Gramercy                Soho         Murray Hill        Little Italy 
             302670               78188              127397               33254 
       Central Park   Greenwich Village             Midtown Morningside Heights 
              51726              174398              619229               19887 
...

And a second time to find the drop-off neighborhood:


nyc_taxi$dropoff_nhood <- add.neighborhoods(nyc_taxi$dropoff_longitude, nyc_taxi$dropoff_latitude, nyc_shapefile)
table(nyc_taxi$dropoff_nhood, useNA = "ifany")
       West Village        East Village        Battery Park       Carnegie Hill 
              84123              117727               34784               47099 
           Gramercy                Soho         Murray Hill        Little Italy 
             273631               74566              125972               23588 
       Central Park   Greenwich Village             Midtown Morningside Heights 
              46788              142799              590646               29594 
...

results matching ""

    No results matching ""