Setting up the compute context
The way that RevoScaleR
achieves WODA is by setting and changing the compute context. The compute context refers to the environment in which the computation is happening, which by default is set to the machine hosting the local R session (called the client), but can be change to a remote machine (such a a SQL server or a Hadoop/Spark cluster). At a low level, the same computation will run differently in different compute context, but produce the same results. Whenever we need to perform a computation remotely, we simply change the compute context to the remote environment. RevoScaleR
functions are aware of the compute context at runtime and when the compute context is set to remote they will perform their computation remotely. This is how we can take the computation to the data instead of bringing the data to the computation. Other R functions are not compute-context-aware, however as we will see by using the rxExec
function we can send any arbitrary R function to execute remotely.