R Data Pre-Processing & Data Management - Shape your Data!

Learn how to prepare your data for great analytics in R.

4.75 (656 reviews)
Udemy
platform
English
language
Data Science
category
R Data Pre-Processing & Data Management - Shape your Data!
4,857
students
6.5 hours
content
Nov 2018
last update
$59.99
regular price

What you will learn

import data into R in several ways while also beeing able to identify a suitable import tool

select and implement a proper object class (data.frame, data.table, data_frame)

convert your data into (and understand) a tidy data format

filter and query your data based on a wide range of parameters

join 2 data tables together with dplyr 2 table verb syntax

use SQL code within R

translate basic R into SQL

work with dates and time

work with strings using regular expressions

detecting outliers in datasets

Why take this course?

Let’s get your data in shape!

Data Pre-Processing is the very first step in data analytics. You cannot escape it, it is too important. Unfortunately this topic is widely overlooked and information is hard to find.

With this course I will change this!

Data Pre-Processing as taught in this course has the following steps:

1.       Data Import: this might sound trivial but if you consider all the different data formats out there you can imagine that this can be confusing. In the course we will take a look at a standard way of importing csv files, we will learn about the very fast fread method and I will show you what you can do if you have more exotic file formats to handle.

2.       Selecting the object class: a standard data.frame might be fine for easy standard tasks, but there are more advanced classes out there like the data.table. Especially with those huge datasets nowadays, a data.frame might not do it anymore. Alternatives will be demonstrated in this course.

3.       Getting your data in a tidy form: a tidy dataset has 1 row for each observation and 1 column for each variable. This might sound trivial, but in your daily work you will find instances where this simple rule is not followed. Often times you will not even notice that the dataset is not tidy in its layout. We will learn how tidyr can help you in getting your data into a clean and tidy format.

4.       Querying and filtering: when you have a huge dataset you need to filter for the desired parameters. We will learn about the combination of parameters and implementation of advanced filtering methods. Especially data.table has proven effective for that sort of querying on huge datasets, therefore we will focus on this package in the querying section.

5.       Data joins: when your data is spread over 2 different tables but you want to join them together based on given criteria, you will need joins for that. There are several methods of data joins in R, but here we will take a look at dplyr and the 2 table verbs which are such a great tool to work with 2 tables at the same time.

6.       Integrating and interacting with SQL: R is great at interacting with SQL. And SQL is of course the leading database language, which you will have to learn sooner or later as a data scientist. I will show you how to use SQL code within R and there is even a R to SQL translator for standard R code. And we will set up a SQLite database from within R. 

7.  Outlier detection: Datasets often contain values outside a plausible range. Faulty data generation or entry happens regularly. Statistical methods of outlier detection help to identify these values. We will take a look at the implemention of these.

8. Character strings as well as dates and time have their own rules when it comes to pre-processing. In this course we will also take a look at these types of data and how to effectively handle it in R.

How do you best prepare yourself for this course?

You only need a basic knowledge of R to fully benefit from this course. Once you know the basics of RStudio and R you are ready to follow along with the course material. Of course you will also get the R scripts which makes it even easier.

The screencasts are made in RStudio so you should get this program on top of R. Add on packages required are listed in the course.

Again, if you want to make sure that you have proper data with a tidy format, take a look at this course. It will make your analytics with R much easier!

Reviews

Edi
January 17, 2022
...gives an overview of different aspects of data preprocessing but too redundant and sometimes in details with no importance.
Brian
August 20, 2021
Some syntax is outdated. Some lessons are simply him reading through commands without much explanation. Some exercise questions are extremely vague, and one doesn't even have all parts listed, so when you look at the solutions there's an additional problem that you didn't know to do.
Sharon
June 12, 2021
More real applied cases in data processing, including what should to do after gathered a huge bunch of data, what do I need to think about data filtering first and what should I do to filter then.
Suyog
March 10, 2021
The first section was rushed a little bit. It was difficult to follow. The pacing of the lessons improved later on.
Brandon
February 26, 2021
first 3 lessons of Section 1 are good. I would prefer to have all packages and sample data sets made available right up front to download and install.
Keerati
February 19, 2021
It's the great course which contains some ideas and technique to improve your code more efficient. Thanks for the developing team to launch this course.
Shadi
February 18, 2021
The course was detailed and compact at the same time. Questions were answered pretty fast when asked. Thank you for the extra exercises and also the website.
Glenn
February 5, 2021
Perfect. Enjoy your teaching style. I have 30 years of SPSS and SAS experience. Learning the processes and syntax in R is very helpful.
Malgorzata
February 4, 2018
I think it is very good training for the beginners like me:) Very well explained examples and very good choices of topics.
Miguel
December 6, 2017
This course is very straight forward. It is not linear in the sense that it acts like a repository of tools rather than a roadmap for Data Management. Complete and intuitive.
Erdélyi
November 6, 2017
Useful, easy to follow, real-life examples. Sometimes hard to understand (ie names of R functions) I recommend to switch on the caption (English). An excellent course.
Tim
November 2, 2017
There are a few packages introduced in the course which are really worth to take a look at. This is data preparation at a slighly advanced level. Liked it a lot.
Jimmy
October 31, 2017
Interesting how SQL and R work in combination. Did not know that before and I am using R for year now.
Ivan
October 14, 2017
This course has good examples, the explanations are understandable. In general the dicactica is very well.
Angela
September 17, 2017
It gives detailed examples to apply what has been taught, which is transferrable to actual practice. A good elementary course for beginners for data cleaning and management, the key to good data analysis!

Charts

Price

R Data Pre-Processing & Data Management - Shape your Data! - Price chart

Rating

R Data Pre-Processing & Data Management - Shape your Data! - Ratings chart

Enrollment distribution

R Data Pre-Processing & Data Management - Shape your Data! - Distribution chart
779488
udemy ID
3/2/2016
course created date
11/19/2019
course indexed date
Bot
course submited by