A Big Data Hadoop and Spark project for absolute beginners

Data Engineering Spark Hive Python PySpark Scala Coding Framework Testing IntelliJ Maven Glue Databricks Delta Lake

4.50 (1323 reviews)
Udemy
platform
English
language
IT Certification
category
instructor
A Big Data Hadoop and Spark project for absolute beginners
14,532
students
12.5 hours
content
Jan 2024
last update
$84.99
regular price

What you will learn

Big Data , Hadoop and Spark from scratch by solving a real world use case using Python and Scala

Spark Scala & PySpark real world coding framework.

Real world coding best practices, logging, error handling , configuration management using both Scala and Python.

Serverless big data solution using AWS Glue, Athena and S3

Why take this course?

This course will prepare you for a real world Data Engineer role !


Data Engineering is a crucial component of data-driven organizations, as it encompasses the processing, management, and analysis of large-scale data sets, which is essential for staying competitive.


This course provides an opportunity to quickly get started with Big Data through the use of a free cloud clusters, and solve a practical use case.


You will learn the fundamental concepts of Hadoop, Hive, and Spark, using both Python and Scala. The course aims to develop your Spark Scala and PySpark coding abilities to that of a professional developer, by introducing you to industry-standard coding practices such as logging, error handling, and configuration management.


Additionally, you will understand the Databricks Lakehouse Platform and learn how to conduct analytics using Python and Scala with Spark, apply Spark SQL and Databricks SQL for analytics, develop a data pipeline with Apache Spark, and manage a Delta table by accessing version history, restoring data, and utilizing time travel features. You will also learn how to optimize query performance using Delta Cache, work with Delta Tables and Databricks File System, and gain insights into real-world scenarios from our experienced instructor.


What you will learn :


  • Big Data, Hadoop concepts

  • How to create a free Hadoop and Spark cluster using Google Dataproc

  • Hadoop hands-on - HDFS, Hive

  • Python basics

  • PySpark RDD - hands-on

  • PySpark SQL, DataFrame - hands-on

  • Project work using PySpark and Hive

  • Scala basics

  • Spark Scala DataFrame

  • Project work using Spark Scala

  • Developing a practical comprehension of Databricks Delta Lake Lakehouse concepts through hands-on experience

  • Learning to operate a Delta table by accessing its version history, recovering data, and utilizing time travel functionality

  • Spark Scala Real world coding framework and development using Winutil, Maven and IntelliJ.

  • Python Spark Hadoop Hive coding framework and development using PyCharm

  • Building a data pipeline using Hive , PostgreSQL, Spark

  • Logging , error handling and unit testing of PySpark and Spark Scala applications

  • Spark Scala Structured Streaming

  • Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena

  • How to become a productive data engineer leveraging ChatGPT


Prerequisites :


This course is designed for Data Engineering beginners with no prior knowledge of Python and Scala required. However, some familiarity with databases and SQL is necessary to succeed in this course.  Upon completion, you will have the skills and knowledge required to succeed in a real-world Data Engineer role.

Screenshots

A Big Data Hadoop and Spark project for absolute beginners - Screenshot_01A Big Data Hadoop and Spark project for absolute beginners - Screenshot_02A Big Data Hadoop and Spark project for absolute beginners - Screenshot_03A Big Data Hadoop and Spark project for absolute beginners - Screenshot_04

Our review

--- **Overview and Course Rating** The online course on Big Data technologies, including Apache Hadoop, Spark, and related tools on Google Cloud Platform (GCP), has garnered a global rating of 4.41 based on recent reviews. The general consensus is that the content is well-structured and educational, though some users have noted challenges with the pace of teaching and the clarity of explanations. **Pros:** - **Comprehensive Content:** Many reviewers appreciated the depth and breadth of the course materials, finding them relevant and informative. - **Practical Application:** Users reported that they could apply what they learned directly to real-world scenarios, particularly in data engineering tasks. - **Useful for Beginners:** The course is recommended as a starting point for those new to big data technologies. - **Hands-On Experience:** The hands-on approach was highly praised for its practicality and effectiveness in learning. - **Coverage of Relevant Topics:** Users appreciated the focus on useful topics, with some noting it to be the best available course on the subject. - **Step-by-Step Explanations:** Positive feedback was given for instances where concepts were broken down and explained step by step. **Cons:** - **Pace of Teaching:** Several users felt that the instructor moved too quickly through content, making it difficult to keep up and fully grasp the material. - **Code Execution Clarity:** Some reviewers were frustrated with the way code was executed and explained during the course, with a few expressing confusion due to rapid video cutting or incomplete instructions. - **Technical Issues:** There were reports of outdated content, particularly with changes in software interfaces, such as IntelliJ GUI and AWS menus, which made it hard to follow along. - **Quality of Q&A:** A few users mentioned poor or absent responses from the instructor when they sought clarification or help. - **Technical Help:** Some users found that technical issues with the code examples provided in the course were not adequately addressed, even when following instructions precisely. **Additional Notes:** - **Updates Needed:** Certain sections of the course needed to be updated to reflect current software versions and interfaces. - **Platform Compatibility:** Some users experienced issues if they were using a Mac, suggesting that the course may not be compatible with all platforms. - **Career Value:** Despite some challenges, many reviewers agreed that the knowledge gained from the course had real value for those seeking to get jobs in big data technologies. **Recommendation:** The course is generally recommended for its comprehensive coverage of Big Data technologies and practical application. However, potential students should be aware of the need for careful attention to pace and the necessity of verifying instructions with current software versions. Those who follow along diligently and seek additional support when needed will likely find this course to be a valuable resource in their learning journey. --- **TL;DR:** The Big Data course is highly educational and practical, especially for beginners. It covers a wide range of topics and provides hands-on experience. However, the pace might be too fast for some, and some content may be outdated or hard to follow due to rapid video editing and software changes. The quality of support during Q&A sessions has been inconsistent. Despite these issues, the course is still considered valuable for understanding and applying Big Data technologies in real-world scenarios.

Charts

Price

A Big Data Hadoop and Spark project for absolute beginners - Price chart

Rating

A Big Data Hadoop and Spark project for absolute beginners - Ratings chart

Enrollment distribution

A Big Data Hadoop and Spark project for absolute beginners - Distribution chart
2583632
udemy ID
9/30/2019
course created date
10/9/2019
course indexed date
Bot
course submited by