Solutions
(1) Because qsap1
is a matrix, we can query it the same way we query any n-dimensional array
:
qsap1[c('5%', '95%'), c('trip_distance', 'trip_duration')]
trip_distance trip_duration
5% 0.5 178
95% 10.2 2038
Since qlap1
is a list with one element per each column of the data, we use two brackets to extract the percentiles for column separately. Moreover, because the percentiles themselves are stored in a named vector, we can pass the names of the percentiles we want in a single bracket to get the desired result.
qlap1[['trip_distance']][c('5%', '95%')]
5% 95%
0.5 10.2
qlap1[['trip_duration']][c('5%', '95%')]
5% 95%
178 2038
(2) In this case, sapply
and lapply
both return a list
, simply because there is no other way for sapply
to organize the results. We can just return the results for passenger_count
and tip_percent
as a sublist.
qsap2[c('passenger_count', 'tip_percent')]
$passenger_count
[1] 5 1 2 6 3 4 0 9 7 8
$tip_percent
[1] 23 0 17 2 12 6 21 18 20 16 13 19 1 7 14 10 22 11 25 8 15 5 9 3 26 24
[27] 4 NA 30 36 35 28 33 54 58 27 34 31 29 32 66 70 47 99 40 37 82 57 45 46 44 50
[53] 55 43 65 38 60 42 76 90 41 53 64 61 51 73 49 83 71 81 62 80 86 94 72 87 56 63
[79] 88 52 93 48 39 84 92 91 79 74 75 78 68 89 96 67 69 97 85 59 95 98 77
(3) Since we have the unique values for each column stored in qlap2
, we can just run the length
function to count how many unique values each column has. For example, for passenger_count
we have
length(qlap2[['passenger_count']]) # don't forget the double bracket here!
[1] 10
But we want to do this automatically for all the columns at once. The solution is to use sapply
. So far we've been using sapply
and lapply
with the dataset as input. But we can just as well feed them any random list like qsap
and apply a function to each element of that list (as long as doing so doesn't result in an error for any of the list's elements).
sapply(qlap2, length)
passenger_count trip_distance fare_amount tip_amount trip_duration
10 3632 1162 2957 8965
tip_percent
101
The above exercise offers a glimpse of how powerful R can be and quickly and succinctly processing the basic data types, as long as we write good functions and use the apply
family of functions to iterate through the data types. A good goal to set for yourself as an R programmer is to increase your reliance on the apply
family of function to run your code.