From 0 to 1 : Spark for Data Science with Python

Get your data to fly using Spark for analytics, machine learning and data science​

4.55 (776 reviews)
Udemy
platform
English
language
Data Science
category
instructor
8,212
students
8.5 hours
content
Feb 2018
last update
$69.99
regular price

What you will learn

Use Spark for a variety of analytics and Machine Learning tasks

Implement complex algorithms like PageRank or Music Recommendations

Work with a variety of datasets from Airline delays to Twitter, Web graphs, Social networks and Product Ratings

Use all the different features and libraries of Spark : RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX

Description

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data. 

Get your data to fly using Spark for analytics, machine learning and data science 

Let’s parse that.

What's Spark? If you are an analyst or a data scientist, you're used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.

Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease. 

Machine Learning and Data Science : Spark's core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We'll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets. 

What's Covered:

Lot's of cool stuff ..

  • Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset
  • Dataframes and Spark SQL to work with Twitter data
  • Using the PageRank algorithm with Google web graph dataset
  • Using Spark Streaming for stream processing 
  • Working with graph data using the  Marvel Social network dataset 



.. and of course all the Spark basic and advanced features: 

  • Resilient Distributed Datasets, Transformations (map, filter, flatMap), Actions (reduce, aggregate) 
  • Pair RDDs , reduceByKey, combineByKey 
  • Broadcast and Accumulator variables 
  • Spark for MapReduce 
  • The Java API for Spark 
  • Spark SQL, Spark Streaming, MLlib and GraphFrames (GraphX for Python) 

Content

You, This Course and Us

You, This Course and Us
Course Materials

Introduction to Spark

What does Donald Rumsfeld have to do with data analysis?
Why is Spark so cool?
An introduction to RDDs - Resilient Distributed Datasets
Built-in libraries for Spark
Installing Spark
The PySpark Shell
Transformations and Actions
See it in Action : Munging Airlines Data with PySpark - I
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables

Resilient Distributed Datasets

RDD Characteristics: Partitions and Immutability
RDD Characteristics: Lineage, RDDs know where they came from
What can you do with RDDs?
Create your first RDD from a file
Average distance travelled by a flight using map() and reduce() operations
Get delayed flights using filter(), cache data using persist()
Average flight delay in one-step using aggregate()
Frequency histogram of delays using countByValue()
See it in Action : Analyzing Airlines Data with PySpark - II

Advanced RDDs: Pair Resilient Distributed Datasets

Special Transformations and Actions
Average delay per airport, use reduceByKey(), mapValues() and join()
Average delay per airport in one step using combineByKey()
Get the top airports by delay using sortBy()
Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
See it in Action : Analyzing Airlines Data with PySpark - III

Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes

Get information from individual processing nodes using accumulators
See it in Action : Using an Accumulator variable
Long running programs using spark-submit
See it in Action : Running a Python script with Spark-Submit
Behind the scenes: What happens when a Spark script runs?
Running MapReduce operations
See it in Action : MapReduce with Spark

Java and Spark

The Java API and Function objects
Pair RDDs in Java
Running Java code
Installing Maven
See it in Action : Running a Spark Job with Java

PageRank: Ranking Search Results

What is PageRank?
The PageRank algorithm
Implement PageRank in Spark
Join optimization in PageRank using Custom Partitioning
See it Action : The PageRank algorithm using Spark

Spark SQL

Dataframes: RDDs + Tables
See it in Action : Dataframes and Spark SQL

MLlib in Spark: Build a recommendations engine

Collaborative filtering algorithms
Latent Factor Analysis with the Alternating Least Squares method
Music recommendations using the Audioscrobbler dataset
Implement code in Spark using MLlib

Spark Streaming

Introduction to streaming
Implement stream processing in Spark using Dstreams
Stateful transformations using sliding windows
See it in Action : Spark Streaming

Graph Libraries

The Marvel social network using Graphs

Screenshots

From 0 to 1 : Spark for Data Science with Python - Screenshot_01From 0 to 1 : Spark for Data Science with Python - Screenshot_02From 0 to 1 : Spark for Data Science with Python - Screenshot_03From 0 to 1 : Spark for Data Science with Python - Screenshot_04

Reviews

Daniel
June 15, 2022
Bad explained and no data sources. Also you have to quit the video before the end because the level of the music is disgusting
Amrita
January 11, 2022
the best course i have encountered so far on pyspark!!. only request would be to remove the music from the end of the videos, they really are too loud..
Sanjay
December 9, 2020
This course is excellent. Instructor is explaining in the best way, so that students can understand the concepts. But there is one place where improvement can be:- Instructor who is running the code is not presenting in the better way.
Anand
January 22, 2017
One of the best courses out there! It is very well organized and uses extremely effective presentation techniques. The instructors are technically strong. Concepts are developed and reinforced effectively in each chapter while showing the big picture while moving from chapter to chapter. The projects have been selected to give students a good exposure to "real world" problems.
tee
October 22, 2016
This was an excellent introduction to learning Spark with Python! The instruction was clear and the topics were comprehensive. Having the lecture slides, python code, and practice data available to work along on the exercises as you are learning was extremely valuable to the overall learning experience. Terrific!
Andres
September 30, 2016
Simply excellent!! The Looney Team always answer the questions. They have great disposition to help the students.
H2m2
September 25, 2016
The Looney Corn Courses are all great. Great breadth, great detail. Great example coding exercises and drills. I can elaborate on this course as soon as I am deeper in the topics, so stay tuned... (1) advantage is that the class installs on jupyter notebook, local and hadopp ("choose your poison) (2) the presentation does not stop with python - also Java can be enjoyed. A pretty great closure example is demonstrated in detailed - that part is a powerhouse. (3) the entire platform is introduced: including Graphs...
Geetha
September 9, 2016
I think this is the best online course I have taken on Spark so far. Concepts such as the accumulator, as well as sections such as MLib, SparkSQL and GraphFrames are explained in such a simple manner so the user can easily grasp the concept. Brilliantly done, instructor Janani. You explain really well! Anyone working with Spark should take this course. Highly recommend it!
Rakesh
August 26, 2016
-> As of now just gone through the 3 lectures which contains only introduction -> After looking into the remaining courses it seems awesome content and super lectures -> I wolud recommend to take this course -> Thanks to all who has designed this course - Totally Excellent
Ranjitcdev
August 15, 2016
The course is an excellent summary of spark core and related modules. I highly recommend it to anyone who want to learn spark.
Sushil
August 8, 2016
Two more use case with solution needed in each topics. So that each topics would have become much more clear.
Eric
August 7, 2016
Excellent course: 1. very clear explanations 2. explanations are very well illustrated 3. all slides are available in downloadable pdf files 4. very good examples (including Spark SQL, Spark Streaming, MLLib, and GraphX) to learn from
John
August 5, 2016
A very well done course. I especially like the format for each feature of showing the code that will be written in the first part and then writing that code in the second part.
Ansuman
July 7, 2016
On Dice Report: Spark is Fastest-Growing Tech Skills in 2016. http://insights.dice.com/2016/04/12/dice-report-fastest-growing-tech-skills-2/ Very up to date for each and every spark concepts with python and java .A very well made course, lot's of great videos that very well explained and awesome follow up pdf's. Super engaging course from an industry expert ,clearly described each and every concepts and easy practical implementation. I highly recommend this course.Take This Course If You're Serious about learning detailed about Spark.
Pramod
July 5, 2016
every concept is explained in precise manner....coolest explanation which I ever see .....strongly recommend this course for those who are actually willingly to build strong fundamentals in analytics field ..........

Charts

Price

From 0 to 1 : Spark for Data Science with Python - Price chart

Rating

From 0 to 1 : Spark for Data Science with Python - Ratings chart

Enrollment distribution

From 0 to 1 : Spark for Data Science with Python - Distribution chart
886024
udemy ID
6/23/2016
course created date
11/22/2019
course indexed date
Bot
course submited by