Azure Databricks and Spark SQL (Python)

Master Azure Databricks with PySpark: Your Hands-On Guide to Advanced Data Engineering and Analysis (DP203)

4.65 (1515 reviews)
Udemy
platform
English
language
Data Science
category
instructor
Azure Databricks and Spark SQL (Python)
14,425
students
12.5 hours
content
Apr 2024
last update
$94.99
regular price

What you will learn

Azure Databricks

Data Lakehouse

Delta Lake

Spark SQL

PySpark

Big Data

Real World Scenarios

CI/CD on Databricks

Source Control with Databricks Repos

Why take this course?

Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.

The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures… and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.

I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.

The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.

You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2Azure Repos and Azure DevOps.

The course will cover a variety of areas including:

  • Set Up and Overview

  • Azure Databricks Notebooks

  • Spark SQL

  • Reading and Writing Data

  • Data Analysis and Transformation with Spark SQL in Python

  • Charts and Dashboards in Databricks Notebooks

  • Databricks Medallion Architecture

  • Accessing Data in Cloud Object Storage

  • Hive Metastore

  • Databases, Tables and Views in Databricks

  • Delta Lake / Databricks Lakehouse Architecture

  • Spark Structured Streaming

  • Delta Live Tables

  • Databricks Jobs

  • Access Control Lists (ACLs)

  • Databricks CLI

  • Source Control with Databricks Repos

  • CI/CD on Databricks

Content

Course Overview / Introduction to Spark and Databricks

Course Introduction
Big Data
Hadoop, Spark and Databricks
Apache Spark Architecture
Spark vs Databricks Comparison
Resource: Comparing Apache Spark vs Databricks

Azure and Databricks Set Up

Azure Account Set Up
Azure UI Overview
Resource: Azure Resources
Creating your Databricks Service
Databricks UI Overview
Clusters
Resource: Pricing, Cluster Pools and Runtime Versions
How to use Databricks Notebooks
Mix Languages and add Markdown text in your Notebook
Databricks Utilities Module and FileStore Utilities
Resource: How to use Notebooks
IMPORTANT - Download Course Resource Notebooks
Cost Management and Cancelling your Subscription
Resource: Cancelling your Azure Subscription

Reading and Writing Data

Dataset Download
Databricks FileStore
Resource: File Types
Reading Data
Writing Data
Parquet Files
Deleting Files and Folders

Data Analysis and Transformation with SparkSQL

Selecting and Renaming Columns
Adding New Columns
Changing Data Types
Math Functions and Simple Arithmetic
Sort Functions
String Functions
Datetime Functions
Filtering DataFrames
Conditional Statements
Using SQL Expressions with expr()
Removing Columns
Grouping your DataFrame
Pivot your DataFrame
Joining DataFrames
Union
Unpivot your DataFrame
Pandas

Utilising the Medallion Architecture in Databricks

Medallion Architecture
Resource: Medallion Architecture

Challenge Section: Customer Orders

Dataset Download and DBFS Upload
Assignment 1: Bronze to Silver
Assignment 1 Solutions Walkthrough
Assignment 2: Silver to Gold
Assignment 2 Solutions Walkthrough

Visualizations and Dashboards

Visualizations and Dashboards

Accessing Data from Azure Data Lake Storage (ADLS) with Databricks

Creating an ADLS Gen2 Account
(Optional) Storage Explorer
Accessing via Access Keys
Accessing via SAS Token
Mounting ADLS to DBFS Overview
Mounting ADLS to DBFS Demo
Secret Scopes
End to End Walkthrough Example

Hive Metastore, Databases, Tables and Views

Running SQL on DataFrames
Hive Metastore and Creating Databases
Managed Tables
Specifying a Location for your Underlying Managed Table Data
Unmanaged (External) Tables
Permanent Views

Challenge Section: Employees

Dataset Download and ADLS Upload
Assignment: Employees
Assignment Solutions Walkthrough

Databricks Data Lakehouse / Delta Lake

Databricks Data Lakehouse / Delta Lake Overview
Delta Lake Data Files
Deleting and Updating Records
Merge Into
Table Utility Commands

Modularize Code and Link Notebooks

Running a Notebook from another Notebook
Text Widgets

Screenshots

Azure Databricks and Spark SQL (Python) - Screenshot_01Azure Databricks and Spark SQL (Python) - Screenshot_02Azure Databricks and Spark SQL (Python) - Screenshot_03Azure Databricks and Spark SQL (Python) - Screenshot_04

Reviews

Hadi
October 19, 2023
This course is great acutally. However you might face some problems due to the fact that databricks is changing fast and it might be impossible for you to find a solution on the internet. The only thing I dont like about it, is that the instructor doesnt respond to questions, I have same problems as others and these questions has been never answered by the instructor!
Pawel
May 28, 2023
Thank you very much for this course. When I started it the concept of Spark and Databricks was a dark magic to me. Now I feel like I have a good idea what it is and how to use it. The course is prepared and conducted in a very professional way, well done!
Matias
May 18, 2023
I recently took the Databricks course and it exceeded all my expectations. This course was a delightful blend of fun, interactivity, and practicality. The instructor's passion and dynamic teaching style kept me engaged throughout, while interactive exercises and hands-on projects made learning enjoyable. The well-structured content, supplemented with real-life examples, equipped me with valuable skills that I can apply in my personal and professional life. I highly recommend this course to anyone seeking an engaging and useful learning experience.
Dipankar
May 14, 2023
This course is a must-take for anyone interested in learning about Databricks and Spark. The course is well-structured, engaging, and provides practical examples that help learners to apply the concepts in real-world scenarios. The instructor is knowledgeable and provides clear explanations of complex topics. Overall, this course is an excellent resource that provides a solid foundation in Azure Databricks and Pyspark.
Saud
April 27, 2023
Section#6 onwards makes very little sense , there is alot of play around with various data types e.g. parquet without explanation of its real purpose.The data type conversion is confusing and dissappointing if you compare it to any contemporary platform (PowerBI , Python Jupyter). Using a standalone python program like Jupypter Notebook is far better and easy to use then this Azure - Databricks - DFS scheme Plus keeps you away from the hastle & pain of running / disconnecting Cluster.
Jayesh
April 24, 2023
the course is just perfect, covers everything, has good documentation and very good hand on experience is provided, cant thank enough.
Rem
April 13, 2023
very good for people who want to learn databricks. I have tried other databricks course from other instructors, but this one is definitely the best. The course is presented in a very engaging, detailed and easy to follow structure.
Paul
February 28, 2023
Best Azure Databricks course I have seen on Udemy. Would be completely perfect if this kind of content was integrated with wider Synapse coverage, e.g. scheduling Databricks notebooks within a pipeline, passing variables from Synapse pipelines into notebooks and back into Synapse, error handling with notebooks, etc.
AMANDEEP
February 19, 2023
Very detailed course, especially from DWH perspective on implementation of Medillion architecture. Thanks
Abhishek
February 10, 2023
Best course Prashant, 1st course on Udemy which I completed with in 2 day, very engaging and project oriented. Waiting for your next course.
Aarav
January 27, 2023
Very Wonderful course ,specially for beginners who haven't prior knowledge that how pipeline run,data processed from one layer to another layer on end to end level
Gregory
January 17, 2023
This course is very appropriate for the kinds of real-world scenarios that a data engineer might face.
Jasen
December 13, 2022
seems like what I'm looking for, although the captions have a lot of errors. enjoying the examples along the way
Igor
November 29, 2022
A very detailed course on using Databricks, just as I hoped it to be. The pace is amazing, everything is explained in detail, step by step. I recommend this course to anyone wanting to improve on Databricks skills.
Alan
November 9, 2022
I'm really enjoying this course and have already learnt more than expected. I especially enjoy the challenge sections which gives me an opportunity to apply my learning.

Coupons

DateDiscountStatus
11/11/2022100% OFF
expired
11/13/2022100% OFF
expired
12/6/2022100% OFF
expired
4/17/202491% OFF
working

Charts

Price

Azure Databricks and Spark SQL (Python) - Price chart

Rating

Azure Databricks and Spark SQL (Python) - Ratings chart

Enrollment distribution

Azure Databricks and Spark SQL (Python) - Distribution chart
4880594
udemy ID
9/13/2022
course created date
11/11/2022
course indexed date
Bot
course submited by