Spark and Python for Big Data with PySpark

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!

4.51 (23709 reviews)
Udemy
platform
English
language
Data Science
category
instructor
131,555
students
10.5 hours
content
May 2020
last update
$159.99
regular price

What you will learn

Use Python and Spark together to analyze Big Data

Learn how to use the new Spark 2.0 DataFrame Syntax

Work on Consulting Projects that mimic real world situations!

Classify Customer Churn with Logisitic Regression

Use Spark with Random Forests for Classification

Learn how to use Spark's Gradient Boosted Trees

Use Spark's MLlib to create Powerful Machine Learning Models

Learn about the DataBricks Platform!

Get set up on Amazon Web Services EC2 for Big Data Analysis

Learn how to use AWS Elastic MapReduce Service!

Learn how to leverage the power of Linux with a Spark Environment!

Create a Spam filter using Spark and Natural Language Processing!

Use Spark Streaming to Analyze Tweets in Real Time!

Description

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Content

Introduction to Course

Introduction
Course Overview
Frequently Asked Questions
What is Spark? Why Python?

Setting up Python with Spark

Set-up Overview
Note on Installation Sections

Local VirtualBox Set-up

Local Installation VirtualBox Part 1
Local Installation VirtualBox Part 2
Setting up PySpark

AWS EC2 PySpark Set-up

AWS EC2 Set-up Guide
Creating the EC2 Instance
SSH with Mac or Linux
Installations on EC2

Databricks Setup

Databricks Setup

AWS EMR Cluster Setup

AWS EMR Setup

Python Crash Course

Introduction to Python Crash Course
Jupyter Notebook Overview
Python Crash Course Part One
Python Crash Course Part Two
Python Crash Course Part Three
Python Crash Course Exercises
Python Crash Course Exercise Solutions

Spark DataFrame Basics

Introduction to Spark DataFrames
Spark DataFrame Basics
Spark DataFrame Basics Part Two
Spark DataFrame Basic Operations
Groupby and Aggregate Operations
Missing Data
Dates and Timestamps

Spark DataFrame Project Exercise

DataFrame Project Exercise
DataFrame Project Exercise Solutions

Introduction to Machine Learning with MLlib

Introduction to Machine Learning and ISLR
Machine Learning with Spark and Python with MLlib

Linear Regression

Linear Regression Theory and Reading
Linear Regression Documentation Example
Regression Evaluation
Linear Regression Example Code Along
Linear Regression Consulting Project
Linear Regression Consulting Project Solutions

Logistic Regression

Logistic Regression Theory and Reading
Logistic Regression Example Code Along
Logistic Regression Code Along
Logistic Regression Consulting Project
Logistic Regression Consulting Project Solutions

Decision Trees and Random Forests

Tree Methods Theory and Reading
Tree Methods Documentation Examples
Decision Tress and Random Forest Code Along Examples
Random Forest - Classification Consulting Project
Random Forest Classification Consulting Project Solutions

K-means Clustering

K-means Clustering Theory and Reading
KMeans Clustering Documentation Example
Clustering Example Code Along
Clustering Consulting Project
Clustering Consulting Project Solutions

Collaborative Filtering for Recommender Systems

Introduction to Recommender Systems
Recommender System - Code Along Project

Natural Language Processing

Introduction to Natural Language Processing
NLP Tools Part One
NLP Tools Part Two
Natural Language Processing Code Along Project

Spark Streaming with Python

Introduction to Streaming with Spark!
Spark Streaming Documentation Example
Spark Streaming Twitter Project - Part
Spark Streaming Twitter Project - Part Two
Spark Streaming Twitter Project - Part Three

Bonus

Bonus Lecture:

Screenshots

Spark and Python for Big Data with PySpark - Screenshot_01Spark and Python for Big Data with PySpark - Screenshot_02Spark and Python for Big Data with PySpark - Screenshot_03Spark and Python for Big Data with PySpark - Screenshot_04

Reviews

Cody
October 5, 2023
This was a fantastic course with solid info and lots of great hands on examples. My only complaint is that the streaming section is fairly outdated (since Twitter changed its API access rules) and therefore I didn't actually get any twitter streaming practice. Other than that, this was an incredible resource. Thank you!
Javier
October 3, 2023
The setup for the virtual machine with Spark (Seccion 4) part is not updated.. I have to look information in other part.
R
October 2, 2023
The Instructor knows his stuff, explains clearly and audio is super clear..can't beat that...thank you !
Ahmet
September 19, 2023
i extremely hate american accent. i can almost nothing understand a person who speaks with american accent. i always hear hrrrr, rrrr, wrrr, srrrr and so on. also they swallop some letters and words during the talking. Lanet olsun bir türlü kulaklarım bu aksana alışamadı...
Alexandra
September 14, 2023
Very helpful and informative although slightly outdated. Overall, it's a nice course, but be prepared for a substantial portion dedicated to machine learning.
Bhavna
September 11, 2023
All video's have good content, informative, short with good explanation. Its very helpful for me for the next project. Thanks.
José
September 2, 2023
Malos Subtitulos en español. Las lineas de codificacion son rapidas sin explicacion en detalle. Especialmente en los parametros de las funciones. Las secciones de Modelos son muy acelerados.
Jay
August 24, 2023
The section on NLP will be very useful for my next project. All of Jose's classes have been very helpful and informative.
Auxillia
August 15, 2023
It covered a lot of topic in a easy digestible manner for a beginner to data science field really liked how your thought process and lectures are aligned with real world usage.
Sourav
March 16, 2023
A good course that meets the expectation. Will 10/10 recommend to peple who have trouble learning from Spark documentation all by themselves.
KEVIN
March 14, 2023
bueno excelente es lo que esperaba referente a spark, no doy calificacion con ML ni el lenguaje py porque se supone que deberia estar preprado para ello, se explican las bases fundamentales con spark big data y por eso el curso es bueno excelente
Chris
March 10, 2023
The course is not updated and focuses too much on the machine learning application rather than setting up pipelines that may be used in industry. The spark streaming section is out of date.
Prakhyaat
March 3, 2023
He has not explained many things. And also I am hoping he will explain later. But so far not successful.
Prajjwal
March 1, 2023
this course is good for refreshing your spark knowledge. It covers all the basic examples and scenarios related to spark functionality
Panthea
February 24, 2023
It is very helpful, the only confusing part was going through setting up virtual machine and actually trying to work with it, when I ended up just simply get sprak directly on my machine. I had a hard time with the VM, and for learning purposes my own machine worked my better.

Charts

Price

Spark and Python for Big Data with PySpark - Price chart

Rating

Spark and Python for Big Data with PySpark - Ratings chart

Enrollment distribution

Spark and Python for Big Data with PySpark - Distribution chart
980798
udemy ID
10/10/2016
course created date
8/7/2019
course indexed date
Bot
course submited by