Udemy

Platform

English

Language

Other

Category

Data pre-processing for Machine Learning in Python

How to transform a dataset for a machine learning model

5.00 (1 reviews)

Data pre-processing for Machine Learning in Python

Students

5.5 hours

Content

Apr 2021

Last Update
Regular Price

PLURALSIGHT
PluralSight
Entire course library + Leaning Path
10-day free trial

What you will learn

How to fill the missings in numerical and categorical variables

How to encode the categorical variables

How to transform the numerical variables

How to scale the numerical variables

Principal Component Analysis and how to use it

How to apply oversampling using SMOTE

How to use several useful objects in scikit-learn library


Description

In this course, we are going to focus on pre-processing techniques for machine learning.

Pre-processing is the set of manipulations that transform a raw dataset to make it used by a machine learning model. It is necessary for making our data suitable for some machine learning models, to reduce the dimensionality, to better identify the relevant data, and to increase model performance. It's the most important part of a machine learning pipeline and it's strongly able to affect the success of a project. In fact, if we don't feed a machine learning model with the correctly shaped data, it won't work at all.

Sometimes, aspiring Data Scientists start studying neural networks and other complex models and forget to study how to manipulate a dataset in order to make it used by their algorithms. So, they fail in creating good models and only at the end they realize that good pre-processing would make them save a lot of time and increase the performance of their algorithms. So, handling pre-processing techniques is a very important skill. That's why I have created an entire course that focuses only on data pre-processing.

With this course, you are going to learn:

  1. Data cleaning

  2. Encoding of the categorical variables

  3. Transformation of the numerical features

  4. Scikit-learn Pipeline and ColumnTransformer objects

  5. Scaling of the numerical features

  6. Principal Component Analysis

  7. Filter-based feature selection

  8. Oversampling using SMOTE

All the examples will be given using Python programming language and its powerful scikit-learn library. The environment that will be used is Jupyter, which is a standard in the data science industry. All the sections of this course end with some practical exercises and the Jupyter notebooks are all downloadable.



Content

Introduction

Introduction to the course

Numerical and categorical variables

The dataset

Required Python packages

Jupyter notebooks

Data cleaning

Introduction to data cleaning

Selecting numerical and categorical variables

Cleaning the numerical features

Cleaning the categorical features

KNN blank filling

ColumnTransformer and make_column_selector

Exercises

Encoding of the categorical features

Introduction to the encoding of categorical variables

One-hot encoding

Ordinal encoding

Label encoding of the target variable

Exercise

Transformations of the numerical features

Introduction to transformations

Power Transformation

Binning

Binarizing

Applying an arbitrary transformation

Exercise

About power transformations

Pipelines

Define a transformation pipeline

Pipelines and ColumnTransformer together

Exercises

Scaling

Introduction to scaling

Normalization, Standardization, Robust scaling

Exercise

Principal Component Analysis

Introduction to PCA

How to perform PCA

Exercise

Filter-based feature selection

Introduction to feature selection

Numerical features, numerical target

Numerical features, categorical target

Categorical features, numerical target

Categorical features, categorical target

Feature importance according to a model

A comment on mutual information

A comment on feature selection with categorical variables

Exercises

A complete pipeline

An example of a complete pipeline

Oversampling

Introduction to SMOTE

How to perform SMOTE

Exercise

General guidelines

Practical suggestions


Coupons

DateDiscountStatus
6/3/202150% OFFValid

4001116

Udemy ID

4/23/2021

Course created date

4/28/2021

Course Indexed date
Bot
Course Submitted by

Twitter
Telegram