Dealing with coordinates

It's time to clean the longitude and latitude columns. We will do so by simply replacing the values that are outside of the acceptable range with NAs. NAs are the appropriate way to handle missing values in R. We are assuming that those values were mistakenly recorded and are as good as NAs. In some cases, this may not be a safe assumption.

To perform this transformation we use the ifelse function:

ifelse(condition, what_to_do_if_TRUE, what_to_do_if_FALSE)

nyc_taxi$pickup_longitude <- ifelse(nyc_taxi$pickup_longitude < -75 | nyc_taxi$pickup_longitude > -73, 
                                    NA, # return NA when the condition is met
                                    nyc_taxi$pickup_longitude) # keep it as-is otherwise

We will do the other three transformations using the transform function instead, because it has a cleaner syntax and we can do multiple transformations at once.

nyc_taxi <- transform(nyc_taxi, 
                      dropoff_longitude = ifelse(dropoff_longitude < -75 | dropoff_longitude > -73, NA, dropoff_longitude),
                      pickup_latitude = ifelse(pickup_latitude < 38 | pickup_latitude > 41, NA, pickup_latitude),
                      dropoff_latitude = ifelse(dropoff_latitude < 38 | dropoff_latitude > 41, NA, dropoff_latitude)
)

If we rerun summary we can see the counts for NAs as part of the summary now:

summary(nyc_taxi[ , grep('long|lat', names(nyc_taxi), value = TRUE)])
 pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
 Min.   :-75      Min.   :38      Min.   :-75       Min.   :39      
 1st Qu.:-74      1st Qu.:41      1st Qu.:-74       1st Qu.:41      
 Median :-74      Median :41      Median :-74       Median :41      
 Mean   :-74      Mean   :41      Mean   :-74       Mean   :41      
 3rd Qu.:-74      3rd Qu.:41      3rd Qu.:-74       3rd Qu.:41      
 Max.   :-73      Max.   :41      Max.   :-73       Max.   :41      
 NA's   :66465    NA's   :66616   NA's   :64397     NA's   :64909

results matching ""

    No results matching ""