The NYC Taxi dataset has a data dictionary, which we can use to properly label the columns VendorID, RateCodeID, store_and_fwd_flag and payment_type.

First check the column type for payment_type. Now based on the information in the data dictionary, run a transformation that creates a column called card_vs_cash (type factor) based payment_type. It will have two levels card and cash (lumping anything that isn't card or cash into an category).

We will be running the above transformation on nyc_jan_xdf, which is created here:

input_csv <- 'yellow_tripsample_2016-01.csv'
input_xdf <- 'yellow_tripsample_2016-01.xdf'
rxImport(input_csv, input_xdf, overwrite = TRUE)

nyc_jan_xdf <- RxXdfData(input_xdf)
## your code goes here

