Machine learning with Scikit-learn

Learn the most important machine learning techniques using the best machine learning library available

3.95 (91 reviews)
Udemy
platform
English
language
Data Science
category
612
students
6.5 hours
content
Mar 2017
last update
$54.99
regular price

What you will learn

Load data into scikit-learn; Run many machine learning algorithms both for unsupervised and supervised data.

Assess model accuracy and performance

Being able to decide what's the best model for every scenario

Description

This course will explain how to use scikit-learn to do advanced machine learning. If you are aiming to work as a professional data scientist, you need to master scikit-learn!

It is expected that you have some familiarity with statistics, and python programming. It's not necessary to be an expert, but you should be able to understand what is a Gaussian distribution, code loops and functions in Python, and know the basics of a maximum likelihood estimator. The course will be entirely focused on the python implementation, and the math behind it will be omitted as much as possible.

The objective of this course is to provide you with a good understanding of scikit-learn (being able to identify which technique you can use for a particular problem). If you follow this course, you should be able to handle quite well a machine learning interview. Even though in that case you will need to study the math with more detail.

We'll start by explaining what is the machine learning problem, methodology and terminology. We'll explain what are the differences between AI, machine learning (ML), statistics, and data mining. Scikit-learn (being a Python library) benefits from Python's spectacular simplicity and power. We'll start by explaining how to install scikit-learn and its dependencies. And then show how can we can use Pandas data in scikit-learn, and also benefit from SciPy and Numpy. We'll then show how to create synthetic data-sets using scikit-learn. We will be able to create data-sets specifically tailored for regression, classification and clustering.

In essence, machine learning can be divided into two big groups: supervised and unsupervised learning. In supervised learning we will have an objective variable (which can be continuous or categorical) and we want to use certain features to predict it. Scikit-learn will provide estimators for both classification and regression problems. We will start by discussing the simplest classifier which is "Naive Bayes". We will then see some powerful regression techniques that via a special trick called regularization, will help get much better linear estimators. We will then analyze Support Vector Machines, a powerful technique for both regression and classification. We will then use classification and regression trees to estimate very complex models. We will see how we can combine many of the existing estimators into simpler structures, but more robust for out of sample performance, called "ensemble" methods. In particular random forests, random trees, and boosting methods. These methods are the ones winning most data science competitions nowadays.

We will see how we can use all these techniques for online data, image classification, sales data, and more. We also use real datasets from Kaggle such as spam SMS data, house prices in the United States, etc. to teach the student what to expect when working with real data.

On the other hand, in unsupervised learning we will have a set of features (but with no outcome or target variable) and we will attempt to learn from that data. Whether it has outliers, whether it can be grouped into groups, whether we can remove some of those features, etcetera. For example we will see k-means which is the simplest algorithm for classifying observations into groups. We will see that sometimes there are better techniques such as DBSCAN. We will then explain how we can use principal components to reduce the dimensionality of a data-set. And we will
use some very powerful scikit-learn functions that learn the density of the data, and are able to classify outliers.

I try to keep this course as updated as possible, specially since scikit-learn is constantly being updated. For example, neural networks was added in the latest release. I tried to keep the examples as simple as possible, keeping the amount of observations (samples) and features (variables) as small as possible. In real situations, we will use hundreds of features and thousands of samples, and most of the methods presented here scale really well into those scenarios. I don't want this course to be focused on very realistic examples, because I think it obscures what we are trying to achieve in each example. Nevertheless, some more complex examples will be added as additional exercises.

  

Content

Introduction to Scikit-learn

Introduction
Installing scikit-learn
Data manipulation: from Pandas to scikit-learn
Creating synthetic data

Supervised methods

Naive Bayes : Bernoulli - Multinomial
Detecting spam in real SMS Kaggle data
Linear Support Vector Machines (SVM): SVM and LinearSVC
Linear Support Vector Machines (SVM): NuSVM
SVM
Logistic regression
Predicting if income >50k using real US Census Data
Isotonic regression
Linear regression - Lasso - Ridge
Lasso - Ridge
Decision trees
Introduction to ensemble methods
Averaging ensemble methods - Part 1: Bagging
Averaging ensemble methods - Part 2: Random forests
Digit Classification via Random Forests
Boosting ensemble methods
Grid Search Cross Validation
Predicting real house prices in the US using ExtraTreesRegressor

Unsupervised methods

Density Estimation
Principal Components
Principal Components
K-Means
DBScan
Clustering
Clustering and PCA on real countries data from Kaggle
Outlier detection
Novelty detection

Screenshots

Machine learning with Scikit-learn - Screenshot_01Machine learning with Scikit-learn - Screenshot_02Machine learning with Scikit-learn - Screenshot_03Machine learning with Scikit-learn - Screenshot_04

Reviews

Enrique
October 1, 2023
Demasiada teoría. No explica bien en qué casos es mejor usar una u otra técnica. Nada sobre redes neuronales...
Avram
April 18, 2022
Not updated to the current sklearn version (1.02) but mostly consistent. Good overview of the library and usage for those coming from R and other languages before diving into the documentation for a particular purpose.
Javier
January 26, 2022
Siento que el curso podría dar mas, la metodología de ir copiando extractos del código no me gusto la verdad
R
November 25, 2020
The pace is not steady and the instructor jumps around and injects too many "thoughts" into his speech. I don't care what he thinks, I just want to learn the material.
Ousmane
October 22, 2020
A good course to run pretty well scikit-learning (maybe the best, with a lot of examples to pratice), but it is not a course to learn deeply the philosophy and maths behind the machine learning algorithms.
Paola
October 11, 2019
he chews while speaking and doesn't go in the deep of the topic. the topics are interesting but dealt with superficially.
Kepa
November 6, 2017
The videos and exercises give a good introduction to some relevant algorithms and their implementation in SciKit. Although the course is quite basic and further material is needed, it helps to speed up the learning process.
Vinay
September 16, 2017
He takes ages to find the right words, which isn't his fault because English isn't his first language, but it's incredibly slow and frustrating for a student. The main problem though is the course itself. This is basically going through the documentation, which you can do yourself. Loads of copy and pasting of basic snippets that you can find on the scikit learn website. I'm massively disappointed because I was really looking forward to this course. I was hoping to have something worth showing to others but it's there's nothing to it. You're just watching a guy copy and paste and chatting without explaining clearly. It's not good enough in my opinion.
Pablo
May 6, 2017
Very poor instructional design. Great for watching copy/paste from right side of screen to left side, and watching a yellow dot move around the screen. The author needs to determine learning objectives, present the objectives, show how to achieve the objectives, and summarize. Author may know Scikit-Learn and Python, but lacks basic instructional design know-how. Suggest author get some help to organize and present his courses. And review other Udacity courses.
Young
April 27, 2017
this course lacks depth and explanations. There are many alternative courses on Udemy that would be much better to learn from.
Tuuber
April 22, 2017
Great course! Would benefit from using Jupyter notebooks for organization, but on the other hand you're forced to pay more attention when no notebooks present so probably better learning outcome. Recommend this to everyone who wishes to jump-start their machine learning with scikit-learn!
Vibhor
March 20, 2017
Excellent! Very well taught. I highly recommend. One of the best teachers to learn Machine Learning at Udemy. The most generous teacher I have ever come across on Udemy!
Sean
March 12, 2017
I am taking a Data Mining course for my MS in computer Science, is this is already a tremendous help to me. It is full of great 'hands on' practice, with great explanations to go along with it.

Charts

Price

Machine learning with Scikit-learn - Price chart

Rating

Machine learning with Scikit-learn - Ratings chart

Enrollment distribution

Machine learning with Scikit-learn - Distribution chart
1019024
udemy ID
11/21/2016
course created date
11/22/2019
course indexed date
Bot
course submited by