Real Time Spark Project for Beginners: Hadoop, Spark, Docker

Building Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker

3.80 (90 reviews)
Udemy
platform
English
language
Databases
category
instructor
Real Time Spark Project for Beginners: Hadoop, Spark, Docker
17,212
students
6.5 hours
content
Oct 2020
last update
$49.99
regular price

What you will learn

Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker

Setting up Single Node Hadoop and Spark Cluster on Docker

Features of Spark Structured Streaming using Spark with Scala

Features of Spark Structured Streaming using Spark with Python(PySpark)

How to use PostgreSQL with Spark Structured Streaming

Basic understanding of Apache Kafka

How to build Data Visualisation using Django Web Framework and Flexmonster

Fundamentals of Docker and Containerization

Why take this course?

  • In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.

  • There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability.

  • Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

  • Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.

  • The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.

  • Data Visualization is built using Django Web Framework and Flexmonster.

  • Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

    Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

    Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

    A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

Screenshots

Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Screenshot_01Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Screenshot_02Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Screenshot_03Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Screenshot_04

Reviews

Sushant
April 19, 2022
the instructor's voiceover is too fast with poor voice modulation. Between most of the chapters, I was forced to increase/decrease volume depending upon the instruction's voice. The concepts are not explained properly and the focus seemed more on the mechanics.
T
January 24, 2021
I liked the course content but the sound quality isn't good, which makes it difficult to understand the content.
Edoardo
October 18, 2020
Every section is described in depth and it helped tremendously practicing with a proper application of this topics to fully understand them. Also, Pari was totally available for any question or issues I had!
BIRAJ
October 14, 2020
It's one of the best course ever which demonstrates an end-to-end ETL pipeline covering the latest big data technologies like Spark Structured Streaming, Apache Kafka, PostgreSQL, Django, Docker, etc. Also, I had an interview recently in which I demonstrated this project, and as expected the interviewer was fully impressed when I showcased the project. Thank you so much, Pari sir I had a great time learning about the course content and really enjoyed your detailed explanation covering the use-case and working with different components. Furthermore, I had issues setting up the docker and other services on my windows machine, and Pari Sir offered me a personal help and we scheduled a Zoom video call to help setup my machine and after some troubleshooting, he managed to setup the entire docker containers and services for me. I would highly recommend this course to anyone who is looking for a Data Engineering role and wants to work on an end-to-end pipeline. PS: I bagged an offer from a Top Consulting company as a Data Engineer.

Charts

Price

Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Price chart

Rating

Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Ratings chart

Enrollment distribution

Real Time Spark Project for Beginners: Hadoop, Spark, Docker - Distribution chart
3494802
udemy ID
9/12/2020
course created date
10/6/2020
course indexed date
Bot
course submited by