Course Introduction
Introduction to Microsoft R Server is designed to help R users learn to process, query, transform, summarize, build models on large datasets and deploy them, all using Microsoft R Server's RevoScaleR
package. This course takes a use-case-based approach by walking through a knowledge discovery and data mining example using MRS.
Pre-requisites
Ideally, this course is for intermediate or advanced R users who have a solid grounding in using R for data and statistical analysis. For example, familiarity with the following topics is assumed:
- basic data types, such as knowing why a
data.frame
a kind oflist
, how to drill down a nestedlist
or uselapply
to loop through alist
, - writing functions, especially vectorized transformation functions,
- know how to work with
factor
columns (such as adding, subtracting or reordering factor levels and what we gain by doing so), - summarizing and visualizing data using
dplyr
andggplot2
, - basics of modeling and machine learning, such as why we divide data into training and testing sets and how to we evaluate models, or why certain algorithms such as k-means require us to standardize the data, and so on.
This course was written for users who come from a business analyst background, such as R, SAS, SPSS or other business analysts who are familiar with computer science and programming concepts, but are not necessarily experts in computer programming or distributed computing, and still want to learn how to use R for running analyses on big datasets and in the future be able to deploy their analytics workflow in a production environment such as Hadoop, Spark or SQL Server.
Learning objective
After completing this course, participants will be able to use R and Microsoft R Server's RevoScaleR
library in order to:
- Read and process flat files (CSV) efficiently
- Clean and prepare data for analysis
- Write complex transformations to add new features to the data
- Visualize, explore, and summarize data
- Build analytical models on large datasets and compare them
- Learn about the pros and cons of a few machine learning algorithms
- Score new data with a model
Throughout this course, we provide enough code examples using RevoScaleR
that the intermediate to advanced R user would learn how to integrate RevoScaleR
into their R workflow and use it to build scalable solution for problems involving large datasets and/or distributed systems.
Please let us know how we can improve our content.
Created by a Microsoft Employee.