Azure Databricks & Spark For Data Engineers (PySpark / SQL)

Real World Project on Formula1 Racing using Azure Databricks, Delta Lake, Unity Catalog, Azure Data Factory [DP203]

4.64 (15198 reviews)
Udemy
platform
English
language
Other
category
Azure Databricks & Spark For Data Engineers (PySpark / SQL)
89,770
students
20 hours
content
Apr 2024
last update
$99.99
regular price

What you will learn

You will learn how to build a real world data project using Azure Databricks and Spark Core. This course has been taught using real world data.

You will acquire professional level data engineering skills in Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2 and Azure Data Factory (ADF)

You will learn how to create notebooks, dashboards, clusters, cluster pools and jobs in Azure Databricks

You will learn how to ingest and transform data using PySpark in Azure Databricks

You will learn how to transform and analyse data using Spark SQL in Azure Databricks

You will learn about Data Lake architecture and Lakehouse Architecture. Also, you will learn how to implement a Lakehouse architecture using Delta Lake.

You will learn how to create Azure Data Factory pipelines to execute Databricks notebooks

You will learn how to create Azure Data Factory triggers to schedule pipelines as well as monitor them.

You will gain the skills required around Azure Databricks and Data Factory to pass the Azure Data Engineer Associate certification exam DP203

You will learn how to connect to Azure Databricks from PowerBI to create reports

You will gain a comprehensive understanding about Unity Catalog and the data governance capabilities offered by Unity Catalog.

You will learn to implement a data governance solution using Unity Catalog enabled Databricks workspace.

Why take this course?

Major updates to the course since the launch

May 2023 - New sections 25, 26 and 27 added to include Unity Catalog. Unity Catalog is a recent addition to Databricks which offers unified data governance solution for a Data Lakehouse. These sections cover all aspects of Unity Catalog and the implementation using a project.

March 2023 - New sections 6 and 7 added. Section 8 Updated. These changes are to reflect latest Databricks recommendations around accessing Azure Data Lake. Also, this provides a better solution to complete the course project for students using Azure Student Subscription or Corporate Subscriptions with limited access to Azure Active Directory.

December 2022 - Sections 3, 4 & 5 updated to reflect recent UI changes to Azure Databricks. Also included lessons on additional functionality included by Databricks recently to Databricks clusters. .


Welcome!

I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Databricks! This course has been taught with implementing a data engineering solution using Azure Databricks and Spark core for a real world project of analysing and reporting on Formula1 motor racing data.

This is like no other course in Udemy for Azure Databricks. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Databricks. I have also included lessons on Azure Data Lake Storage Gen2, Azure Data Factory as well as PowerBI. The primary focus of the course is Azure Databricks and Spark core, but it also covers the relevant concepts and connectivity to the other technologies mentioned. Please note that the course doesn't cover other aspects of Spark such as Spark streaming and Spark ML. Also the course has been taught using PySpark as well as Spark SQL; It doesn't cover Scala or Java.

The course follows a logical progression of a real world project implementation with technical concepts being explained and the Databricks notebooks being built at the same time. Even though this course is not specifically designed to teach you the skills required for passing the Azure Data Engineer Associate Certification Exam DP203, it can greatly help you get most of the necessary skills required for the exam.

I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basics and by the end of the course you will be proficient in the technologies used.

Currently the course teaches you the following

Azure Databricks

  • Building a solution architecture for a data engineering solution using Azure Databricks, Azure Data Lake Gen2, Azure Data Factory and Power BI

  • Creating and using Azure Databricks service and the architecture of Databricks within Azure

  • Working with Databricks notebooks as well as using Databricks utilities, magic commands etc

  • Passing parameters between notebooks as well as creating notebook workflows

  • Creating, configuring and monitoring Databricks clusters, cluster pools and jobs

  • Mounting Azure Storage in Databricks using secrets stored in Azure Key Vault

  • Working with Databricks Tables, Databricks File System (DBFS) etc

  • Using Delta Lake to implement a solution using Lakehouse architecture

  • Creating dashboards to visualise the outputs

  • Connecting to the Azure Databricks tables from PowerBI

Spark (Only PySpark and SQL)

  • Spark architecture, Data Sources API and Dataframe API

  • PySpark - Ingestion of CSV, simple and complex JSON files into the data lake as parquet files/ tables.

  • PySpark - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.

  • PySpark - Creating local and temporary views

  • Spark SQL - Creating databases, tables and views

  • Spark SQL - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.

  • Spark SQL - Creating local and temporary views

  • Implementing full refresh and incremental load patterns using partitions

Delta Lake

  • Emergence of Data Lakehouse architecture and the role of delta lake.

  • Read, Write, Update, Delete and Merge to delta lake using both PySpark as well as SQL 

  • History, Time Travel and Vacuum

  • Converting Parquet files to Delta files

  • Implementing incremental load pattern using delta lake

Unity Catalog

  • Overview of Data Governance and Unity Catalog

  • Create Unity Catalog Metastore and enable a Databricks workspace with Unity Catalog

  • Overview of 3 level namespace and creating Unity Catalog objects

  • Configuring and accessing external data lakes via Unity Catalog

  • Development of mini project using unity catalog and seeing the key data governance capabilities offered by Unity Catalog such as Data Discovery, Data Audit, Data Lineage and Data Access Control.

Azure Data Factory

  • Creating pipelines to execute Databricks notebooks

  • Designing robust pipelines to deal with unexpected scenarios such as missing files

  • Creating dependencies between activities as well as pipelines

  • Scheduling the pipelines using data factory triggers to execute at regular intervals

  • Monitor the triggers/ pipelines to check for errors/ outputs.


Reviews

Elias
October 3, 2023
Good mix of hands-on practice and theoretical overview. Ramesh is a great instructor that is active in helping out in the comments. Very impressed so far!
Bhaskar
October 3, 2023
Explanation is very easy to understand and graphical flow chart helps to understand better, flow of topic is steady and helpful to understand.
SibaPanda
October 1, 2023
Great learning. Each and every subject has been greatly explained and demos are a big plus. Thank you!
Shayan
September 29, 2023
Very in-depth and informative course. My only beef is that some parts need to be majorly updated. Hope the instructor can find time to update the previously recorded videos to include the new layout. As of September 2023, the Data tab in Databricks is renamed to Catalog. Overall, I enjoyed the course and will definitely recommend others to take this course or the other two course about Azure Data Factory and Azure Synapse Analytics by the same instructor.
Pablo
September 28, 2023
The course has really useful contents and I am learning a lot. The reason I am rating it four stars it is because there are sometimes where I find some explanations a bit flat, and, also, one thing that sometimes is uncomfortable for me is that, when coding, the teacher uses fast forward video, so it is difficult to follow. Anyway, I would recommend the course, it is just some comments I wanted to make.
Amaravathi
September 21, 2023
Good. we can easily understand the things. Thank you for your explanation. And also with examples we can easily understand.
Christophe
September 18, 2023
This course is well designed for people who already have even small experiences with cloud platforms.
Ulises
September 13, 2023
Excellent course. However, the inconsistencies between the databricks environment presented and the actual/current environment presents challenges. I don't know what can be done about that given the speed with which databricks seems to evolve, but it's something to consider.
Andi
September 12, 2023
The Azure sign up doesnt work for me, and IDK why. It might help to cover when people are using Databricks community edition as well.
Guy
September 9, 2023
Amazing class !! Everything is crystal clear ! Waw. HIGHLY RECOMMEND FOR ANYONE LOOKING TO HAVE A CAREER AS A DATA ENGINEER.
Pradip
September 9, 2023
I really enjoyed this course. course is brilliantly designed and covers mostly every aspect how real-world project works. Ramesh has a really unique way of teaching which makes things way easier to understand whether you are new learner or experienced professional. I want to recommend this course to everyone who has curiosity about how databricks works in real world projects.
Kavuru
September 9, 2023
The Best course related to Azure data bricks and PySpark. Explains everything. One stop tutorial for azure data bricks
Emily
September 7, 2023
Fantastic course! Ramesh does such a good job at giving an overview, and then using the Formula 1 project to dive into details for practical application. Also appreciated that he would also point out little gotchas along the way. The Q&A is very active and a role issue I had was already asked and answered which shows Ramesh is dedicated to continually supporting the course. Highly recommend this course.
Rujula
September 3, 2023
I'm a newbie to data engineering and this course helped me understand a lot. Thanks to the instructor for creating suck in depth learnings.
Darshan
September 1, 2023
This course is somewhat outdated, there's not anything about Delta Live tables and databricks workflows. He needs to update this course big time.

Charts

Price

Azure Databricks & Spark For Data Engineers (PySpark / SQL) - Price chart

Rating

Azure Databricks & Spark For Data Engineers (PySpark / SQL) - Ratings chart

Enrollment distribution

Azure Databricks & Spark For Data Engineers (PySpark / SQL) - Distribution chart

Related Topics

4182538
udemy ID
7/13/2021
course created date
7/21/2021
course indexed date
Bot
course submited by