Udemy

Platform

English

Language

Data Science

Category

Mastering Databricks & Apache spark -Build ETL data pipeline

Learn fundamental concept about databricks and process big data by building your first data pipeline on Azure

4.28 (98 reviews)

Mastering Databricks & Apache spark -Build ETL data pipeline

Students

4.5 hours

Content

Apr 2021

Last Update
Regular Price

EXCLUSIVE OFFER
Exclusive  Offer
Unlimited access to 30 000 Premium SkillShare courses
30-DAY FREE TRIAL

What you will learn

Databricks

Build your first data pipeline to process CSV, JSON, XML

Orchestrate data pipeline on Azure data factory

Spin up spark cluster

Delta tables

Concept of time travel and vacuum on delta tables

Apache Spark SQL

Filtering Dataframe

Renaming, drop, Select, Cast

Aggregation operations SUM, AVERAGE, MAX, MIN

Rank, Row Number, Dense Rank

Building dashboards

Build Complete project

Build End to End data pipeline


Description

Welcome to the course on Mastering Databricks & Apache spark -Build ETL data pipeline

Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. In this course we will be learning how to perform various operations in Scala, Python and Spark SQL. This will help every student in building solutions which will create value and mindset to build batch process in any of the language. This course will help in writing same commands in different language and based on your client needs we can adopt and deliver world class solution. We will be building end to end solution in azure databricks.


Key Learning Points

  • We will be building our own cluster which will process our data and with one click operation we will load different sources data to Azure SQL and Delta tables

  • After that we will be leveraging databricks notebook to prepare dashboard to answer business questions

  • Based on the needs we will be deploying infrastructure on Azure cloud

  • These scenarios will give student 360 degree exposure on cloud platform and how to step up various resources

  • All activities are performed in Azure Databricks


Fundamentals

  • Databricks

  • Delta tables

  • Concept of versions and vacuum on delta tables

  • Apache Spark SQL

  • Filtering Dataframe

  • Renaming, drop, Select, Cast

  • Aggregation operations SUM, AVERAGE, MAX, MIN

  • Rank, Row Number, Dense Rank

  • Building dashboards

  • Analytics

This course is suitable for Data engineers, BI architect, Data Analyst, ETL developer, BI Manager



Content

Getting Started with Databricks

Introduction

What is Databricks

Project

Create Azure Account

Setting up databricks environment

Importing Notebooks

Understanding Distributed Processing

How to create cluster

Notebook

Why Databricks

Create table or dataframe by uploading data

Extraction of Data

Understanding ETL

Extraction of data from Azure account

Adding Schema to data files

Unmanaged tables

Managed tables

Transformation of Data

Window Functions

Scala - Filtering Dataframe

Scala - Common Operations

Scala - Aggregation commands

Scala - Rank, Row Number, Dense Rank

Python - Filtering Dataframe

Python - Common Operations

Python - Aggregation commands

Python - Rank, Row Number, Dense Rank

Spark SQL - Common Operations

Spark SQL - Aggregation Commands

Spark SQL - Rank, Row Number, Dense Rank

Spark SQL - Global View

Spark SQL - Temp View

Joins

Scala - Joins

Python - Joins

Spark SQL - Joins

Processing XML, JSON, Delta tables

Processing Nested XML file

Processing Nested JSON file

Delta Table - Time Travel and Vacuum

Loading data and building ETL data pipeline with dashboard

Project Description

Spinning up Azure SQL

Key Vault

Secret Scopes

Project building and mounting of containers

Reading XML,JSON,CSV and loading to Delta tables & Azure SQL

Move files from one container to another

Dashboard

Azure Data Factory to orchestrate

Congratulations


Reviews

J
Jhanvi17 April 2021

Thank you for this fundamental yet detailed course on Databricks. This was my introduction to this software and the content included was totally appropriate and helpful for my basic understanding of the same. This course depicts the capabilities of this tool in a smooth way, the interactive notebooks and workspaces, highly optimized processing of data definitely motivates me to explore more. The step-by-step demonstration for setting up the resources, databases, containers, processing data using all the supported languages, loading onto delta tables and Azure SQL, developing simple dashboard gives a clear overview of the lifecycle of data analytics project development using Databricks. This course was perfectly paced for a fresher like me. Looking forward for many such courses from the Instructor. It was a great learning experience.

H
Harshit14 April 2021

What a great course for starting my Databricks journey. Kudos to the Instructor for making such a great course on databricks, The course is structured well for better understanding, simple to understand, sessions with clear voice clarity & good resolution. Lectures are very detailed and concepts well explained. The time spent on each module was worth it. The best part was delving to project, its great to see how data can be used to make meaningful insights from dashboard built. I've improved my knowledge not only Databricks but also Apache Spark SQL and Building dashboards. I really recommend this course for everyone!

D
Donald11 April 2021

I am really excited to use this skill to make an impact in my future projects. Simple dataset always helps me in understanding the concept in much better way. Simple and precise are the two words that I have for this course. I really don’t like lengthy course which are 10 hrs or 15 hrs. This course checks all criteria’s that I have and totally worth the money. I like the passion that author have to teach. This course does talk about building something end to end.

S
Sophia9 April 2021

Excellent knowledge sharing. It really help me to implement it in my project.I can really build ETL pipeline that can run in production.

S
Sam9 April 2021

This is very good course diving into building end to end data pipeline in a most simplest way and I really love the part where author had build dashboards. Looking forward for more courses.

C
Chadi6 April 2021

This course can be a lot better. The instructor at times presented the materials poorly and some times it felt very robotic. The content can be enriched with more useful stuff. I didn't have much trouble because I have some databricks experience and this was more like a refreshing course for me. Overall it's not bad but it definitely can be a lot better...

R
REINALDO26 March 2021

Until the momment (50% course) I'm very interest and happy with my choice, because the Teacher have a good experience and have been add the information step by step

S
Stuart26 March 2021

Impossible to see what’s going on. Explanations seem to be missing entirely or very poor. EG magic command don’t get explained. Overall, I’m feeling very disconnected and disappointed.

S
Shivankar21 March 2021

I'm transitioning into a tech career and have found this course helpful. Part of me wants to jump ahead to the other courses in the specialization but I have been learning more about the basics which is good. Good pace in explaining different topics and nice choice of colors while presenting. I will encourage author to create more advanced materials for us. I really like the part of processing same dataset in different language. Keep teaching.

F
Filipe19 March 2021

What a fantastic course! The content is very well organized, and the instructor makes it all easy to understand. The project we develop along the course is really helpful and gives us a good knowledge of Databricks. Totally recommend!

P
Patricia19 March 2021

O curso é sensacional! Fácil de entender e acompanhar. Tem me ajudado bastante a ampliar meu conhecimento.

A
Anmoldeep18 March 2021

Instructor did an excellent job with this course. He has prepared excellent study material and presents the information in a very clear manner. Value for money this course is an easy 5 star rating.

R
Ratnesh18 March 2021

This course provides the in-depth knowledge of the concepts. I feel the modules are divided perfectly so that you don’t get confused. Like the way Priyank has stated everything, easy to listen and understand.

M
Manika18 March 2021

Amazing course! It has an excellent instructor with clear accent. I've improved my knowledges not only Databricks or Apache Spark, but also Scala, SQL and Python!

M
Mrinal18 March 2021

I bought multiple courses for databricks and this is best so far. Covering different aspect about platform and architecting databases, ADF.


3902836

Udemy ID

3/9/2021

Course created date

4/18/2021

Course Indexed date
Bot
Course Submitted by

Twitter
Telegram