Inspecting the data

With the data loaded in the R session, we are ready to inspect the data and write some basic queries against it. The goal of this chapter is to get a feel for the data. Any exploratory analysis often consists of the following steps:

  1. load all the data (and combine them if necessary)
  2. inspect the data in preparation cleaning it
  3. clean the data in preparation for analysis
  4. add any interesting features or columns as far as they pertain to the analysis
  5. find ways to analyze or summarize the data and report your findings

We are now in step 2, where we intend to introduce some helpful R functions for inspecting the data and write some of our own.

Most of the time, the above steps are not clearly delineated from each other. For example, one could inspect certain columns of the data, clean them, build new features out of them, and then move on to other columns, thereby iterating on steps 2 through 4 until all the columns are dealt with. This approach is completely valid, but for the sake of teaching the course we prefer to show each step as distinct. Moreover, going over results and findings can often guide how data should be collected and processed in the future, so it is more accurate to present the above workflow as being circular, but once again for simplicity we assume a linear workflow.

results matching ""

    No results matching ""