Solutions
Here are some of the ways we can clean the data:
tpep_pickup_datetimeandtpep_dropoff_datetimeshould bedatetimecolumns, notcharacterrate_code_idandpayment_typeshould be afactor, notcharacter- the geographical coordinates for pick-up and drop-off occasionally fall outside a reasonable bound (probably due to error)
 fare_amountis sometimes negative (could be refunds, could be errors, could be something else)
Some data-cleaning jobs depend on the analysis. For example, turning payment_type into a factor is unnecessary if we don't intend to use it as a categorical variable in the model. Even so, we might still benefit from turning it into a factor so that we can see counts for it when we run summary on the data, or have it show the proper labels when we use it in a plot. Other data- cleaning jobs on the other hand relate to data quality issues. For example, unreasonable bounds for pick-up or drop-off coordinates can be due to error. In such cases, we must decide whether we should clean the data by
- removing rows that have incorrect information for some columns, even though other columns might still be correct
 - replace the incorrect information with NAs and decide whether we should impute missing values somehow
 - leave the data as is, but think about how doing so could skew some results from our analysis