Clustering example
Many times we don't necessarily have to resort to using the whole dataset to extract insights from the data. In other words, we really only have a big data problem when using the whole dataset versus a much smaller sample of the data can make a big difference in insight. Even when we do have a big data problem, sampling can be an effective way to gain some preliminary insights into the problem or to speed up the algorithm.
Learning objectives
In this chapter, we learn how to
- develop an intuition for when we truly have a big data problem
- build clusters using the
rxKmeans
algorithm inRevoScaleR
- speed up the clustering algorithm by making an initial pass through the sampled data using the
kmeans
function