Data Wrangling in Pandas for Machine Learning Engineers

The Second Course in a Series for Mastering Python for Machine Learning Engineers

4.25 (118 reviews)
Udemy
platform
English
language
Data Science
category
instructor
Data Wrangling in Pandas for Machine Learning Engineers
801
students
2 hours
content
Jan 2020
last update
$44.99
regular price

What you will learn

You'll learn data wrangling in Python.

You'll be prepared for interview questions on data wrangling in Python.

Data wrangling is what machine learning engineers do around 70% of the time and the skills in this course will put you ahead of others in the real world.

You'll be adept using the most important Python library for data wrangling.

Description

Reviews: 

The examples given and explanation provided by the instructor were great. He is entertaining as well as knowledgeable about the subject. - Prakash Shelke

Spectacular step by step instructions with great examples and labs. -Donato

Great course !!!!! You learn how to use the Pandas library for its own sake and not as a part of some courses devoted to other topics. -Giovanni De Angelis

The course is really impressive. Tons of information, and I learned a great deal. I had no Python background, and now I feel a lot more confident about working with Python than ever. Thanks for the course.  Austin

Honestly Mike your classes speak for themselves. They're informative, concise and just really well put together. They're exactly the kind of courses I look for. -Alex El

I have been a software engineer for more years than I care to admit. I found the presentation, speed and depth fit what I was looking for perfectly. I believe at this point I understand enough about Pandas so that I can move forward with this branch of learning. - Danny

Course Description 

Welcome to Data Wrangling in Pandas for Machine Learning Engineers

This is the second course in a series designed to prepare you for becoming a machine learning engineer.

I'll keep this updated and list only the courses that are live.  Here is a list of the courses that can be taken right now.  Please take them in order. The knowledge builds from course to course. 

  • The Complete Python Course for Machine Learning Engineers 

  • Data Wrangling in Pandas for Machine Learning Engineers (This one) 

  • Data Visualization in Python for Machine Learning Engineers


Learn the single most important skill for the machine learning engineer: Data Wrangling

  • A complete understanding of data wrangling vernacular.

  • Pandas from A-Z. 

  • The ability to completely cleanse a tabular data set in Pandas. 

  • Lab integrated. Please don't just watch. Learning is an interactive event.  Go over every lab in detail. 

  • Real world Interviews Questions.

The knowledge builds from course to course in a serial nature. Without the first course many students might struggle with this one. Thank you. 

Many new to machine learning believe machine learning engineers spend their days building deep neural models in Keras or SciKit-Learn. I hate to be the bearer of bad news but that isn’t the case.

A recent study from Kaggle determined that 80% of time data scientists and machine learning engineers spend their time cleaning data. The term used for cleaning data in data science circles is called data wrangling.  

In this course we are going to learn Pandas using a lab integrated approach. Programming is something you have to do in order to master it. You can't read about Python and expect to learn it. 

Pandas is the single most important library for data wrangling in Python

Data wrangling is the process of programmatically transforming data into a format that makes it easier to work with. 

This might mean modifying all of the values in a given column in a certain way, or merging multiple columns together. The necessity for data wrangling is often a byproduct of poorly collected or presented data. 

In the real world data is messy. Very rarely do you have nicely cleansed data sets to point your supervised models against. 

Keep in mind that 99% of all applied machine learning (real world machine learning) is supervised. That simply means models need really clean, nicely formatted data.  Bad data in means bad model results out. 

                                                           **Five Reasons to Take this Course**

1) You Want to be a Machine Learning Engineer

It's one of the most sought after careers in the world. The growth potential career wise is second to none. You want the freedom to move anywhere you'd like. You want to be compensated for your efforts. You want to be able to work remotely. The list of benefits goes on. Without a solid understanding of data wrangling in Python you'll have a hard time of securing a position as a machine learning engineer. 

2) Most of Machine Learning is Data Wrangling 

If you're new to this space the one thing many won't tell you is that much of the job of the data scientist and the machine learning engineer is massaging dirty data into a state where it can be modeled. In the real world data is dirty and before you can build accurate machine learning models you have to clean it. This process is called data wrangling and without this skills set you'll never get a job as a machine learning engineer.  This course will give you the fundamentals you need to cleanse your data. 

3) The Growth of Data is Insane 

Ninety percent of all the world's data has been created in the last two years. Business around the world generate approximately 450 billion transactions a day. The amount of data collected by all organizations is approximately 2.5 exabytes a day. That number doubles every month.  Almost all real world machine learning is supervised. That means you point your machine learning models at clean tabular data. Python has libraries that are specific to data cleansing. 

4) Machine Learning in Plain English

Machine learning is one of the hottest careers on the planet and understanding the basics is required to attaining a job as a data engineer.  Google expects data engineers and their machine learning engineers to be able to build machine learning models. 

5) You want to be ahead of the Curve 

The data engineer and machine learning engineer roles are fairly new.  While you’re learning, building your skills and becoming certified you are also the first to be part of this burgeoning field.  You know that the first to be certified means the first to be hired and first to receive the top compensation package. 

Thanks for interest in Data Wrangling in Pandas for Machine Learning Engineers

See you in the course!!

Content

Introduction

Introduction
Is this Course for You?
What is Pandas?
What is Data Wrangling
Summary
Quiz
Common Interview Questions - Section 1

Pandas Dataframe Basics

Download Raw Titanic Data Set
Load a Data Set in Pandas
Data Types
Columns, Rows and Cells
Using Loc
iloc and ix
Subsetting Rows and Columns
Lab: Slicing Dataframes
Grouped and Aggregated Calculations
Grouped frequency counts
Lab: Grouping
Summary
Quiz
Common Interview Questions - Section 2

Pandas data structures

The Series Object
Series Anatomy
Lab: Working with the Series Object
Attributes
An Array Defined
The Series and Numpy Array
Lab: Descriptive Statistics For pandas Dataframe
Boolean Subsetting with the Series
Vectorized Operations
Lab: Row Based Conditional Searches
Replacing Values in Pandas
Lab: Saving A Pandas Dataframe As A CSV
Rename Column Header In Pandas
Sorting Rows in a Pandas Dataframe
Read Excel Files
Regular Expression in Pandas
Binning Data
Normalize Data in Pandas
Lab: Data Normalization
Data Normalization Lab Line by Line
Summary
Quiz
Common Interview Questions - Section 3

Introduction to Plotting

Install Seaborn Via Anaconda
Matplotlib
Lab: Matplotlib Basics
Using Seaborn with a Pandas Dataframe
Lab: Seaborn Basics
Summary
Quiz
Common Interview Questions - Section 4

Data Assembly

File Concatenation
Row Concatenation
Lab: Concatenation
Merging
Right, Left and Outer Joins
Lab: Merge Function
Summary
Quiz
Common Interview Questions - Section 5

Missing Data

Evaluate Missing Data
Finding the NaNs
Dropping out Missing Values
Dropping Specific Cells
NaN Value Differences
Lab: Missing Data
Filling Using Index Values
Interpolation of missing values
Handling Duplicate Data
Lab: Duplicate Data
Mapping
The Replace Function
Using Functions to Create Columns
Lab: Breaking Up Strings
Summary
Quiz
Common Interview Questions - Section 6

Time Series Data

Time Series Basics
Timestamp Objects
Lab: Time Series
Timedelta
The DatetimeIndex
Force Datetime Function with Coerce
The Frequency Parameter
Frequency Table
The DateOffset
Built-in Date Offset Classes
Lab: DateOffset
Anchored Offsets
Period Object
Summary
Quiz
Common Interview Questions - Section 7
Congratulations and Thank You!
Bonus Lecture "Data Visualization in Python"

Reviews

Giovanni
May 12, 2019
Great course !!!!! You learn how to use the Pandas library for its own sake and not as a part of some courses devoted to other topics. In the Labs the same examples and instructions as in the tutorials could be reported before tackling the actual Lab topics. This could be pretty useful. In any case, a great course. Well done !!!!!
Chris
March 25, 2019
Some of the more difficult topics was gloss over. The course is ok as an intro to some of the functionalities of pandas.
Torbjörn
March 19, 2019
Good but basic course on Pandas dataframe and how to clean data. Also cover datetime and some other stuff.
Danny
March 19, 2019
I have been a software engineer for more years than I care to admit. I found the presentation, speed and depth fit what I was looking for perfectly. I believe at this point I understand enough about Pandas so that I can move forward with this branch of learning. I am comfortable saying I now know what functionality is provided by pandas and have code samples that will enable me to actually tackle some simple projects right now. Thanks!
Isaac
January 30, 2019
This Class is straight to the point - Targets concepts precisely. If you want to learn fast .. this is the class for you!!!
Nho
January 22, 2019
Dataframe not consistent across videos. Content too basic. It is the same as reading Numpy/Pandas/matplotlib documentation. No technical knowledge given, how to do data wrangling? I bet this won't be public.
Mónica
January 8, 2019
Directo y al grano, un curso muy resumido, no se hace nada pesado y aprendes mucho. Y por si quieres repasar algún tema, es muy fácil encontrar la información , ya que cada tema es un video de menos de dos minutos. Muy recomendable. Una pena que no tenga subtítulos en español.
Michal
October 12, 2018
An excellent course for anyone who wants to start with Pandas seriously. Very much condensed, very much focused. Pandas is mentioned in almost any Python course, but usually by explaining the very basic concept and providing 1 - 2 specific use cases only. On the other hand this course is providing quite broad overview of Pandas functionality, which can be applied in the real life cases.
Enrique
August 24, 2018
Hasta ahora, mi experiencia con machine learning y data wrangling se ceñía fundamentalmente a R. Este curso me permite importar lo que sé de R a python, lo cual es tremendamente útil.
Daniel
July 15, 2018
So far, the course is clear, concise, and highly useful, and yet it's so easy if you already have a background in IT and machine learning in MATLAB or Octave. I'm really starting to love this whole area of study dearly. Well, I completed it and I still feel the same way about it--a very useful course. It got me thinking about work I did in the past and how it could be enhanced greatly given the tools taught here that probably did not exist at that time.
Austin
March 30, 2018
The course is really impressive. Tons of information, and I learned a great deal. I had no Python background, and now I feel a lot more confident about working with Python than ever. Thanks for the course.
Pallavi
March 29, 2018
Well executed course with a good pace and lot of detailed examples. Best part is that the instructor explains the functions really well with examples and the lab exercise which makes it so easy to understand. As i already have experience with pandas so i was able to finish the course in 2-3 days only. Data restructuring, cleaning is one of the most imp aspect of machine learning and this course helped me to understand how to take care of each of those such as missing value drop / imputations or time series etc. Definitely recommended!
Andrew
March 26, 2018
Mr. West did an excellent job of incorporating the wide scope of data handling methodology with Python-specific examples. However, this is just the beginning; entire careers are spent working on various methods to deal with the problem of missing data.
Joseph
March 4, 2018
Another great course by Mike. He is very knowledgeable, speaks clearly and can impart that information to the student. The lessons are broken down into very short, easy to deal with modules that are easy to understand. Looking forward to part 3 in this series!
Anju
February 14, 2018
Really liked the course. Crisp and to the point lectures. Good quizzes and labs. But i think a few more labs will help.

Charts

Price

Data Wrangling in Pandas for Machine Learning Engineers - Price chart

Rating

Data Wrangling in Pandas for Machine Learning Engineers - Ratings chart

Enrollment distribution

Data Wrangling in Pandas for Machine Learning Engineers - Distribution chart
1437428
udemy ID
11/18/2017
course created date
11/20/2019
course indexed date
Bot
course submited by