Solutions

(1) Because qsap1 is a matrix, we can query it the same way we query any n-dimensional array:

qsap1[c('5%', '95%'), c('trip_distance', 'trip_duration')]
    trip_distance trip_duration
5%            0.5           178
95%          10.2          2038

Since qlap1 is a list with one element per each column of the data, we use two brackets to extract the percentiles for column separately. Moreover, because the percentiles themselves are stored in a named vector, we can pass the names of the percentiles we want in a single bracket to get the desired result.

qlap1[['trip_distance']][c('5%', '95%')]
  5%  95% 
 0.5 10.2
qlap1[['trip_duration']][c('5%', '95%')]
  5%  95% 
 178 2038

(2) In this case, sapply and lapply both return a list, simply because there is no other way for sapply to organize the results. We can just return the results for passenger_count and tip_percent as a sublist.

qsap2[c('passenger_count', 'tip_percent')]
$passenger_count
 [1] 5 1 2 6 3 4 0 9 7 8

$tip_percent
  [1] 23  0 17  2 12  6 21 18 20 16 13 19  1  7 14 10 22 11 25  8 15  5  9  3 26 24
 [27]  4 NA 30 36 35 28 33 54 58 27 34 31 29 32 66 70 47 99 40 37 82 57 45 46 44 50
 [53] 55 43 65 38 60 42 76 90 41 53 64 61 51 73 49 83 71 81 62 80 86 94 72 87 56 63
 [79] 88 52 93 48 39 84 92 91 79 74 75 78 68 89 96 67 69 97 85 59 95 98 77

(3) Since we have the unique values for each column stored in qlap2, we can just run the length function to count how many unique values each column has. For example, for passenger_count we have

length(qlap2[['passenger_count']]) # don't forget the double bracket here!
[1] 10

But we want to do this automatically for all the columns at once. The solution is to use sapply. So far we've been using sapply and lapply with the dataset as input. But we can just as well feed them any random list like qsap and apply a function to each element of that list (as long as doing so doesn't result in an error for any of the list's elements).

sapply(qlap2, length)
passenger_count   trip_distance     fare_amount      tip_amount   trip_duration 
             10            3632            1162            2957            8965 
    tip_percent 
            101

The above exercise offers a glimpse of how powerful R can be and quickly and succinctly processing the basic data types, as long as we write good functions and use the apply family of functions to iterate through the data types. A good goal to set for yourself as an R programmer is to increase your reliance on the apply family of function to run your code.

results matching ""

    No results matching ""