Data Science with R

Author

Joschka Schwarz

This section is centered around the use of the R programming language within the tidy data framework, and as such employs the most recent advances in data analysis coding. The chapter provide a sophisticated first introduction to the field of data science and provide a balanced mix of practical skills along with generalizable principles.

Chapter

1 Programming Basics

1.1 Introduction to R

Master the basics of data analysis by manipulating common data structures such as vectors, matrices, and data frames.

1.2 Intermediate R

Discover conditional statements, loops, and functions to power your own R scripts, and learn to make your R code more efficient using the apply functions.

1.3 Introduction to the tidyverse

Get started on the path to exploring and visualizing your own data with the tidyverse, a powerful and popular collection of data science tools within R. Discover the fundamentals of the Tidyverse, and learn all about renaming and reordering variables, while becoming familiar with binomial distribution.

2 Importing Data

2.1 Introduction to Importing Data

Learn to read .xls, .csv, and text files in R using readxl and gdata, before learning how to use readr and data.table packages to import flat file data.

2.2 Intermediate Importing Data

Parse data in any format. Whether it’s flat files, statistical software, databases, or data right from the web.

2.3 Working with web Data

Learn how to efficiently import data from the web into R. Discover how to work with APIs, build your own API client, and access data from Wikipedia and other sources by using R to scrape information from web pages.

3 Data Wrangling

3.1 Data Manipulation with dplyr

Delve further into the Tidyverse by learning to transform and manipulate data with dplyr. Learn how to use dplyr to transform and aggregate data, then add, remove, or change variables. You’ll then apply your skills to a real-world case study.

3.2 Joining data with dplyr

Learn to combine data across multiple tables to answer more complex questions with dplyr. Learn to combine data across multiple tables to answer complex questions with dplyr. You’ll learn 6 different joins including inner, full, anti, and more.

3.3 Exploratory Data Analysis

Learn how to use graphical and numerical techniques for exploratory data analysis while generating insightful and beautiful graphics in R.

3.4 Case Study: EDA

Use data manipulation and visualization skills to explore the historical voting of the United Nations General Assembly.

3.5 Cleaning Data

Develop the skills you need to go from raw data to awesome insights as quickly and accurately as possible.

3.6 Data Manipulation with data.table

Master core concepts about data manipulation such as filtering, selecting and calculating groupwise statistics using data.table.

3.7 Joining Data with data.table

This course will show you how to combine and merge datasets with data.table.

4 Data Visualization

4.1 Intermediate Data Visualization with ggplot2

Learn to produce meaningful and beautiful data visualizations with ggplot2 by understanding the grammar of graphics.

4.2 Intermediate Data Visualization with ggplot2

Learn to use facets, coordinate systems and statistics in ggplot2 to create meaningful explanatory plots.

5 Statistics

5.1 Introduction to Statistics

Grow your statistical skills and learn how to collect, analyze, and draw accurate conclusions from data. Learn how to work with variables, plotting, and standard deviation in R. It covers histograms, distributions and more.

5.2 Foundations of Probability

In this course, you’ll learn about the concepts of random variables, distributions, and conditioning. Learn about random variables, distributions and conditioning, while gaining intuition for how to solve probability problems through random simulation.

5.3 Introduction to Regression

Learn how you can predict housing prices and ad click-through rate by implementing, analyzing, and interpreting linear and logistic regressions using R.

5.4 Intermediate Regression

Learn to perform linear and logistic regression with multiple explanatory variables. Discover how to include multiple explanatory variables in a model, how interactions affect predictions, and how linear and logistic regression work in R.

5.5 Modeling with Data in the Tidyverse

Explore Linear Regression in a tidy framework.Discover different types in data modeling, including for prediction, and learn how to conduct linear regression and model assement measures in the Tidyverse.

5.6 Experimental Design

In this course you’ll learn about basic experimental design, a crucial part of any data analysis. Learn about basic experimental design, including block and factorial designs, and commonly used statistical tests, such as the t-tests and ANOVAs in R.

5.7 A/B Testing

Learn A/B testing: including hypothesis testing, experimental design, and confounding variables.

5.8 Fundamentals of Bayesian Data Analysis

Learn what Bayesian data analysis is, how it works, and why it is a useful tool to have in your data science toolbox.

5.9 Factor Analysis

Explore latent variables, such as personality, using exploratory and confirmatory factor analyses. Start this four-hour course today to discover exploratory factor analysis and confirmatory factor analysis in R to explore latent variables such as personality.

7 Machine Learning

7.1 Supervised Learning: Classification

Basics of machine learning for classification. This beginner-level introduction to machine learning covers four of the most common classification algorithms. You will come away with a basic understanding of how each algorithm approaches a learning task, as well as learn the R functions needed to apply these tools to your own work.

7.2 Supervised Learning: Regression

In this course you will learn how to predict future events using linear regression, generalized additive models, random forests, and xgboost.

7.3 Unsupervised Learning

This course provides an intro to clustering and dimensionality reduction in R from a machine learning perspective.

7.4 Machine Learning in the tidyverse

Leverage the tools in the tidyverse to generate, explore and evaluate machine learning models.

7.5 Cluster Analysis

Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.

7.6 Cluster Analysis

This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for performance, and how to preprocess data.

7.7 Modeling with tidymodels

Learn to streamline your machine learning workflows with tidymodels.

7.8 Machine Learning with tree-based Models

Learn how to use tree-based models and ensembles to make classification and regression predictions with tidymodels.

7.9 Support Vector Machines

This course will introduce the support vector machine (SVM) using an intuitive, visual approach.

7.10 Topic Modeling

Learn how to fit topic models using the Latent Dirichlet Allocation algorithm.

7.11 Hyperparameter Tuning

Use the caret, mlr and h2o packages to find optimal hyperparameters using grid search, random search, adaptive resampling and automatic machine learning.

7.12 Bayesian Regression Modeling

Learn how to leverage Bayesian estimation methods to make better inferences about linear regression models.

7.13 Introduction to Spark

Learn how to analyze huge datasets using Apache Spark and R using the sparklyr package.