Examining the data

In addition to asking whether the data makes logical sense, it's often a good idea to also check whether the data makes business sense or practical sense. Doing so can help us catch certain errors in the data such as data being mislabeled or attributed to the wrong set of features. If unaccounted, such soft errors can have a profound impact on the analysis.

Learning objectives

After reading this chapter, we will understand how to

  • run basic tabulations and summaries on the data
  • take the return objects from RevoScaleR summary functions for further processing by R functions for plotting and more
  • visualize the distribution of a column with rxHistogram
  • take a random sample of the large data and use it as a way to examine outliers

results matching ""

    No results matching ""