PySpark & AWS: Master Big Data With PySpark and AWS

Mastering AWS & PySpark: Spark, PySpark, AWS, Spark Ecosystem, Hadoop, and Spark Applications [AWS, Hadoop, Pyspark]

4.43 (2021 reviews)
Udemy
platform
English
language
Data Science
category
instructor
PySpark & AWS: Master Big Data With PySpark and AWS
14,143
students
19 hours
content
Apr 2024
last update
$89.99
regular price

What you will learn

● The introduction and importance of Big Data.

● Practical explanation and live coding with PySpark.

● Spark applications

● Spark EcoSystem

● Spark Architecture

● Hadoop EcoSystem

● Hadoop Architecture

● PySpark RDDs

● PySpark RDD transformations

● PySpark RDD actions

● PySpark DataFrames

● PySpark DataFrames transformations

● PySpark DataFrames actions

● Collaborative filtering in PySpark

● Spark Streaming

● ETL Pipeline

● CDC and Replication on Going

Why take this course?

Comprehensive Course Description:

The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you’ll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you’ll learn how to execute end-to-end workflows using PySpark.


Right through the course, you’ll be using PySpark for performing data analysis. You’ll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you’ll explore the transformations and actions that can be performed on the data using Spark RDDs and dataframes. You’ll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You’ll use the Databricks environment for running the Spark scripts and explore it as well.


Finally, you’ll have a taste of Spark with AWS cloud. You’ll see how we can leverage AWS storages, databases, computations, and how Spark can communicate with different AWS services and get its required data.   


How Is This Course Different? 

In this Learning by Doing course, every theoretical explanation is followed by practical implementation.   


The course ‘PySpark & AWS: Master Big Data With PySpark and AWS’ is crafted to reflect the most in-demand workplace skills. This course will help you understand all the essential concepts and methodologies with regards to PySpark. The course is:

• Easy to understand. 

• Expressive. 

• Exhaustive. 

• Practical with live coding. 

• Rich with the state of the art and latest knowledge of this field. 


As this course is a detailed compilation of all the basics, it will motivate you to make quick progress and experience much more than what you have learned. At the end of each concept, you will be assigned Homework/tasks/activities/quizzes along with solutions. This is to evaluate and promote your learning based on the previous concepts and methods you have learned. Most of these activities will be coding-based, as the aim is to get you up and running with implementations.   

High-quality video content, in-depth course material, evaluating questions, detailed course notes, and informative handouts are some of the perks of this course. You can approach our friendly team in case of any course-related queries, and we assure you of a fast response.   


The course tutorials are divided into 140+ brief videos. You’ll learn the concepts and methodologies of PySpark and AWS along with a lot of practical implementation. The total runtime of the HD videos is around 16 hours.


Why Should You Learn PySpark and AWS? 

PySpark is the Python library that makes the magic happen.   

PySpark is worth learning because of the huge demand for Spark professionals and the high salaries they command. The usage of PySpark in Big Data processing is increasing at a rapid pace compared to other Big Data tools.   

AWS, launched in 2006, is the fastest-growing public cloud. The right time to cash in on cloud computing skills—AWS skills, to be precise—is now.


Course Content:

The all-inclusive course consists of the following topics:

1. Introduction:

a. Why Big Data?

b. Applications of PySpark

c. Introduction to the Instructor

d. Introduction to the Course

e. Projects Overview

2. Introduction to Hadoop, Spark EcoSystems, and Architectures:

a. Hadoop EcoSystem

b. Spark EcoSystem

c. Hadoop Architecture

d. Spark Architecture

e. PySpark Databricks setup

f. PySpark local setup


3. Spark RDDs:

a. Introduction to PySpark RDDs

b. Understanding underlying Partitions

c. RDD transformations

d. RDD actions

e. Creating Spark RDD

f. Running Spark Code Locally

g. RDD Map (Lambda)

h. RDD Map (Simple Function)

i. RDD FlatMap

j. RDD Filter

k. RDD Distinct

l. RDD GroupByKey

m. RDD ReduceByKey

n. RDD (Count and CountByValue)

o. RDD (saveAsTextFile)

p. RDD (Partition)

q. Finding Average

r. Finding Min and Max

s. Mini project on student data set analysis

t. Total Marks by Male and Female Student

u. Total Passed and Failed Students

v. Total Enrollments per Course

w. Total Marks per Course

x. Average marks per Course

y. Finding Minimum and Maximum marks

z. Average Age of Male and Female Students

4. Spark DFs:

a. Introduction to PySpark DFs

b. Understanding underlying RDDs

c. DFs transformations

d. DFs actions

e. Creating Spark DFs

f. Spark Infer Schema

g. Spark Provide Schema

h. Create DF from RDD

i. Select DF Columns

j. Spark DF with Column

k. Spark DF with Column Renamed and Alias

l. Spark DF Filter rows

m. Spark DF (Count, Distinct, Duplicate)

n. Spark DF (sort, order By)

o. Spark DF (Group By)

p. Spark DF (UDFs)

q. Spark DF (DF to RDD)

r. Spark DF (Spark SQL)

s. Spark DF (Write DF)

t. Mini project on Employees data set analysis

u. Project Overview

v. Project (Count and Select)

w. Project (Group By)

x. Project (Group By, Aggregations, and Order By)

y. Project (Filtering)

z. Project (UDF and With Column)

aa. Project (Write)

5. Collaborative filtering:

a. Understanding collaborative filtering

b. Developing recommendation system using ALS model

c. Utility Matrix

d. Explicit and Implicit Ratings

e. Expected Results

f. Dataset

g. Joining Dataframes

h. Train and Test Data

i. ALS model

j. Hyperparameter tuning and cross-validation

k. Best model and evaluate predictions

l. Recommendations


6. Spark Streaming:

a. Understanding the difference between batch and streaming analysis.

b. Hands-on with spark streaming through word count example

c. Spark Streaming with RDD

d. Spark Streaming Context

e. Spark Streaming Reading Data

f. Spark Streaming Cluster Restart

g. Spark Streaming RDD Transformations

h. Spark Streaming DF

i. Spark Streaming Display

j. Spark Streaming DF Aggregations

7. ETL Pipeline

a. Understanding the ETL

b. ETL pipeline Flow

c. Data set

d. Extracting Data

e. Transforming Data

f. Loading data (Creating RDS)

g. Load data (Creating RDS)

h. RDS Networking

i. Downloading Postgres

j. Installing Postgres

k. Connect to RDS through PgAdmin

l. Loading Data

8. Project – Change Data Capture / Replication On Going

a. Introduction to Project

b. Project Architecture

c. Creating RDS MySql Instance

d. Creating S3 Bucket

e. Creating DMS Source Endpoint

f. Creating DMS Destination Endpoint

g. Creating DMS Instance

h. MySql WorkBench

i. Connecting with RDS and Dumping Data

j. Querying RDS

k. DMS Full Load

l. DMS Replication Ongoing

m. Stoping Instances

n. Glue Job (Full Load)

o. Glue Job (Change Capture)

p. Glue Job (CDC)

q. Creating Lambda Function and Adding Trigger

r. Checking Trigger

s. Getting S3 file name in Lambda

t. Creating Glue Job

u. Adding Invoke for Glue Job

v. Testing Invoke

w. Writing Glue Shell Job

x. Full Load Pipeline

y. Change Data Capture Pipeline


After the successful completion of this course, you will be able to:

● Relate the concepts and practicals of Spark and AWS with real-world problems

● Implement any project that requires PySpark knowledge from scratch

● Know the theory and practical aspects of PySpark and AWS


Who this course is for:

● People who are beginners and know absolutely nothing about PySpark and AWS

● People who want to develop intelligent solutions

● People who want to learn PySpark and AWS

● People who love to learn the theoretical concepts first before implementing them using Python

● People who want to learn PySpark along with its implementation in realistic projects

● Big Data Scientists

● Big Data Engineers


Enroll in this comprehensive PySpark and AWS course now to master the essential skills in Big Data analytics, data processing, and cloud computing.

Whether you're a beginner or looking to expand your knowledge, this course offers a hands-on learning experience with practical projects. Don't miss this opportunity to advance your career and tackle real-world challenges in the world of data analytics and cloud computing. Join us today and start your journey towards becoming a Big Data expert with PySpark and AWS!


List of keywords:


  • Big Data analytics

  • Data analysis

  • Data cleaning

  • Machine learning (ML)

  • Spark RDDs

  • Dataframes

  • Spark SQL queries

  • Spark ecosystem

  • Hadoop

  • Databricks

  • AWS cloud

  • Spark scripts

  • AWS services

  • PySpark and AWS collaboration

  • PySpark tutorial

  • PySpark hands-on

  • PySpark projects

  • Spark architecture

  • Hadoop ecosystem

  • PySpark Databricks setup

  • Spark local setup

  • Spark RDD transformations

  • Spark RDD actions

  • Spark DF transformations

  • Spark DF actions

  • Spark Infer Schema

  • Spark Provide Schema

  • Spark DF Filter rows

  • Spark DF (Count, Distinct, Duplicate)

  • Spark DF (sort, order By)

  • Spark DF (Group By)

  • Spark DF (UDFs)

  • Spark DF (Spark SQL)

  • Collaborative filtering

  • Recommendation system

  • ALS model

  • Spark Streaming

  • ETL pipeline

  • Change Data Capture (CDC)

  • Replication

  • AWS Glue Job

  • Lambda Function

  • RDS

  • S3 Bucket

  • MySql Instance

  • Data Migration Service (DMS)

  • PgAdmin

  • Spark Shell Job

  • Full Load Pipeline

  • Change Data Capture Pipeline


Screenshots

PySpark & AWS: Master Big Data With PySpark and AWS - Screenshot_01PySpark & AWS: Master Big Data With PySpark and AWS - Screenshot_02PySpark & AWS: Master Big Data With PySpark and AWS - Screenshot_03PySpark & AWS: Master Big Data With PySpark and AWS - Screenshot_04

Reviews

Davidhleuk
September 23, 2023
First few sections (section 1 to 7) are ok, section 8 is a chaotic for me. Some steps are not in correct sequence, sometimes the lecturer seemed not well prepared (like the section for creating param group in AWS). First I am following him to fill-in many input, but he forgot to create that group, and so am I. Then I need to re-input everything after refresh the page. Same thing happened in DMS part. I wish he may review and re-upload all such lectures
Shlok
September 22, 2023
The Course was good but the audio is low and the few topics were not taught in depth. while doing practical few steps were skipped because of that I had I spend extra time in figuring out what is wrong. I hope you consider this points and improve on it.
Mohamed
September 7, 2023
I wouldn't recommend this course for beginners without foundational knowledge of Spark and Hadoop. The instructor occasionally glosses over certain concepts without in-depth explanations. Moreover, there are 5-6 hour segments solely on AWS. Some vital concepts are not elucidated well, while on the other hand, some obvious topics are stretched out for as long as 10 minutes. I believe this course would suit those with prior knowledge of Spark, Hadoop, and a basic understanding of AWS. For context, I took a precursor course titled "Spark Starter Kit," which provided an excellent introduction to Spark.
ALEJANDRO
August 25, 2023
It's an excellent course, the teachers are soo patient to explain all the concepts and elemets in each part of the course. I highly recommend it.
Sarthak
August 21, 2023
It is a great course to kickstart on PySpark and Big Data, It helped me to understand the concepts easily. Keep posting amazing tutorials.
Alexander
July 1, 2023
This is a great course on Pyspark and AWS. The first sections of the course even helped me to prepare and pass the Databricks Apache Spark 3.0 - Python certification. Every question I asked was answered in detail in a short time. Note: The Databrics section is a bit old, but the AWS sections are recorded not long ago!
Manasa
May 24, 2023
All topics are interesting. I felt like the chatbot creation was bit different compare to the previous lectures. But overall the course is good for beginners in Pyspark and AWS.
Xisca
March 21, 2023
It was a good course. The hadoop, Spark DF, CDC project were really interesting. The chat project was nice but difficult to related with the pySpark training previously presented.
Ahmet
March 19, 2023
The last parts of the course were mostly irrelevant to pyspark. They were more like AWS development courses
Pinki
February 27, 2023
It is very clear crisp course with lots of hands on. I am practicing while @ahmed is explaining which is helpful
Jesús
February 27, 2023
Great overview of how CDC is implemented using AWS, MySQL and PySpark, as well as an introduction to PySpark
Ishank
January 30, 2023
only introductory stuff. most of video content related to either installation, setup of environment or very basic coding for writing ETL pipeline. nothing substantial to learn. money wasted
Yanni
January 27, 2023
This course is good for beginner in programming, but for more experienced ones or people who want to go deeper in spark logic and optimization, this course doesn't focus on it
Siddhanth
January 23, 2023
Great short videos where I feel like I can take breaks in between when needed and in detail explanations.
Carlos
January 22, 2023
Decepcionante. Conteúdo sem foco, fala de spark e de aws ao mesmo tempo. Maior parte sobre processos aws (processos de criação), vídeos mal editados, vídeos com 5 minutos explicando e escrevendo 3 linhas de import.

Charts

Price

PySpark & AWS: Master Big Data With PySpark and AWS - Price chart

Rating

PySpark & AWS: Master Big Data With PySpark and AWS - Ratings chart

Enrollment distribution

PySpark & AWS: Master Big Data With PySpark and AWS - Distribution chart
4076436
udemy ID
5/25/2021
course created date
6/1/2021
course indexed date
Bot
course submited by