Clustering example

Many times we don't necessarily have to resort to using the whole dataset to extract insights from the data. In other words, we really only have a big data problem when using the whole dataset versus a much smaller sample of the data can make a big difference in insight. Even when we do have a big data problem, sampling can be an effective way to gain some preliminary insights into the problem or to speed up the algorithm.

Learning objectives

In this chapter, we learn how to

develop an intuition for when we truly have a big data problem
build clusters using the rxKmeans algorithm in RevoScaleR
speed up the clustering algorithm by making an initial pass through the sampled data using the kmeans function

Filtering Manhattan

Clustering example

Learning objectives

results matching ""

No results matching ""