Creating a function
With the last exercise as introduction, believe it or not, we know everything we need to know to accomplish the automation task we set out to do. We already have the bulk of the code that the function relies on, so it's often a matter of pasting it into the body of the function and making some minor changes. To write good functions, we often begin by writing code that works, then we identify the need for automation (to reuse code, to automatically clean intermediate results), and finally we wrap the code around a function and modify and test it to make sure it still works.
Of course writing good functions can be more involved than what we described here. This is especially so when we write functions that we intend to use across multiple projects or share with others. In such cases, we often spend more time anticipating all the ways that the function could break given different inputs and try to account for such cases.
With the last exercise as a backdrop, let's now delete pickup_nhood
and dropoff_nhood
from the data and recreate those columns, this time by writing a function.
nyc_taxi$pickup_nhood <- NULL # we drop this column so we can re-create it here
nyc_taxi$dropoff_nhood <- NULL # we drop this column so we can re-create it here
We call the function add.neighborhoods
. Its inputs are the dataset, the names of the longitude and latitude coordinates (as strings), and the shapefile. The output we return is a single column containing the neighborhoods names.
add.neighborhoods <- function(long_var, lat_var, shapefile) {
require(regeos)
require(maptools)
data_coords <- data.frame(long = ifelse(is.na(long_var), 0, long_var),
lat = ifelse(is.na(lat_var), 0, lat_var)) # create `data.frame` with only those two columns
coordinates(data_coords) <- c('long', 'lat') # designate the columns as geographical coordinates
nhoods <- over(data_coords, shapefile) # find the neighborhoods the coordinates fall into
nhoods$NAME <- factor(nhoods$NAME, levels = as.character(shapefile@data$NAME)) # reset factor levels to Manhattan only
return(nhoods$NAME) # return only the column with the neighborhoods
}
We can now use our function twice. Once to find the pick-up neighborhood:
nyc_taxi$pickup_nhood <- add.neighborhoods(nyc_taxi$pickup_longitude, nyc_taxi$pickup_latitude, nyc_shapefile)
table(nyc_taxi$pickup_nhood, useNA = "ifany")
West Village East Village Battery Park Carnegie Hill
94222 135597 33783 43896
Gramercy Soho Murray Hill Little Italy
302670 78188 127397 33254
Central Park Greenwich Village Midtown Morningside Heights
51726 174398 619229 19887
...
And a second time to find the drop-off neighborhood:
nyc_taxi$dropoff_nhood <- add.neighborhoods(nyc_taxi$dropoff_longitude, nyc_taxi$dropoff_latitude, nyc_shapefile)
table(nyc_taxi$dropoff_nhood, useNA = "ifany")
West Village East Village Battery Park Carnegie Hill
84123 117727 34784 47099
Gramercy Soho Murray Hill Little Italy
273631 74566 125972 23588
Central Park Greenwich Village Midtown Morningside Heights
46788 142799 590646 29594
...