Taming Big Data using Spark & Scala

Working on Big Data Projects & writing CCA 175 Made Easy with project scenarios & Practice questions for CCA 175

3.85 (10 reviews)
Udemy
platform
English
language
Development Tools
category
instructor
Taming Big Data using Spark & Scala
39
students
38 hours
content
Mar 2019
last update
$89.99
regular price

What you will learn

Big Data and its EcoSystem like Hadoop , Sqoop, Hive, Flume, Kafka, Spark using Scala, Spark SQL & Spark Streaming

Both the Concepts (Theories & Architectures) + Practicals

Assignments & Projects Scenarios for Real Projects

Build, deploy, and run Spark scripts on Hadoop clusters

Transform structured data using SparkSQL and DataFrames

Process continual streams of data with Spark Streaming

Working on intellij and executing the JAR through scripts

Practice questions for CCA 175 Certification

Description

The Course is for those who do not know even ABC of Big Data and tools, want to learn them and be in a comfortable situation to implement them in projects. The course is also for those, who have some knowledge on Big Data tools, but want to enhance them further and be comfortable working in Projects. Due to the extensive scenario implementation, the course is also suitable for people interested to write Big Data Certifications like CCA 175. The course contains Practice Test for CCA 175.

Because the course is focused on setting up the entire Hadoop Platform on your windows (for those having less than 6GB RAM) and providing or working on fully configured VM's, you need not to buy cluster very often to practice the tools. Hence, the Course is ONE TIME INVESTMENT for secure future.


In the course, we will learn how to utilize Big Data tools like Hadoop, Flume, Kafka, Spark, Scala (the most valuable tech skills on the market today).

In this course I will show you how to -

  1. Use Scala and Spark to analyze Big Data.

  2. Practice Test for writing CCA 175 Exam is available at the end of the course.

  3. Extensive and Real time project scenarios with solutions as you will write in REAL PROJECTS

  4. Use Sqoop to import data from Traditional Relational Databases to HDFS & Hive.

  5. Use Flume and Kafka to process streaming data

  6. Use Hive to view and store data & Partition the tables

  7. Use Spark Streaming to fetch the streaming data from Kafka & Flume

  8. The VM's in the course are configured to work synchronously together and also have Spark 2.2.0 Version Installed. (Standard Cloudera VM has Spark 1.6 Installed with NO KAFKA and requires an upgrade for Spark, while the VM's provided in the course has Spark 2.2 configured and working along with Kafka.)

Big Data is the most in demand skills right now, and with this course you can learn them quickly and easily! You can also learn the components in the basic setup in files like "hdfs-site.xml", "core-site.xml" etc  They are good to know if working for a projet.

The course is focused on upskilling someone who do not know Big Data tools and target is to bring them up-to the mark to be able to work in Big Data projects seamlessly without issues.

This course comes with some project scenarios and multiple datasets to work on with.

After completing this course you will feel comfortable putting Big Data, Scala and Spark on your resume and also will be easily able to work and implement in projects!

Thanks and I will see you inside the course!

Content

Introduction

Introduction
Practice Test Added for CCA 175 Certification

Big Data Platform Setup

Different forms of Big Data Platforms
Installation on Windows or Cloudera
Browse through Shared Course content
Course -Additional Section Info

Use Windows/Cloudera VM provided in the course

Setup VM
Setup IntelliJ on VM
WIndows HDFS Error & Fix

Simply setup IntelliJ and Spark and Practice only these two

Setup Mysql & Basics
Setup Spark
Setup IntelliJ - Part 1
Setup IntelliJ - Part 2
Possible Issue in IntelliJ
SBT Setup forScala CLI/REPL
Winutil Setup in Windows for Hadoop like implementation

Learning Hadoop - Architecture, Concepts & Implementation

Hadoop Architecture - Part 1 - Basics of Hadoop
Hadoop Architecture - Part 2 - Understanding NameNode and DataNode
Hadoop Architecture - Part 3 - Understanding Job Tracker & Task Tracker
Hadoop Refresh & File Systems
Hadoop Terminologies & Configurations in XML Files
Hadoop Commands on Windows or Windows VM - Part 1
Hadoop Commands on Windows or Windows VM - Part 2
Hadoop Commands on Cloudera Quick Start VM

Learning Sqoop - Architecture, Concepts & Implementation

Sqoop Architecture
Sqoop Eval on Windows/ Windows VM
Sqoop Eval on Windows - Using -e & --query options
Sqoop List Database and List Tables - Used for creating Generic Code
Sqoop Import Command - Understanding and Analysing the Map-Reduce Functionality
Sqoop Import - Append Mode of Execution
Sqoop Import - Overwrite option & Different File Formats supported
Sqoop Import - Using Where & Columns Options to filter the data import
Sqoop Import - Executing User Specific Query with Where Clause
Sqoop Import - Incremental Load Execution
Sqoop Jobs - Create, List & Execute Sqoop Jobs
Sqoop Import All Option to Import all tables from Mysql to HDFS
Sqoop Import - Import from MySQL To Hive - Basic Import
Sqoop Import - Import from MySQL To Hive - More Options
Sqoop Import All - Import from MySQL to Hive using Import All
Sqoop Import - from Mainframe - A basic know how
Sqoop Export - Bring Data from HDFS to MySQL
Sqoop Assignment for Practice

Learning Hive - Architecture, Concepts & Implementation

Hive - Introduction & Features
Hive - Architecture & Map-Reduce Execution
Hive Tables
Hive Partitioning & Bucketing - Concepts and Difference
Hive Query Language - Overview and Syntax
Hive QL - Practicals - Create Database & Tables & load sample data
Hive QL - Practicals - Load Huge Data to Managed Tables
Hive QL - Practicals - Creating and Loading Manged & External Tables
Hive QL - Practicals - Partitioning in Hive
Hive QL - Practicals - Bucketing in Hive
Hive User Defined Functions
Hive Performance Tuning Methods

Learning Flume - Architecture, Concepts & Implementation

Flume - Concepts, Usage, Features & Advantages
Flume Architecture
Flume Data Flows , Contextual Routing & Other Concepts
Basics of Flume Configurations
Setup of Telnet in Windows
Flume Practicals - Simple Flume Job using NetCat
Flume Practicals - Flume Job using EXEC
Flume Practicals - Flume Job using Sequence Generator
Flume Practicals - Flume Job using Sequence Generator on HDFS
Flume Practicals - Flume Job using Twitter on Windows
Flume Practicals - Flume Job using Twitter on Cloudera
Flume Practicals - Flume Job using Twitter on File Channel
Flume Practicals - Flume Job using Twitter to Hive Sink
Flume Multiplexing - One Source, One Channel & Two Sink - Logger and HDFS Sinks
Industry Usage of Flume

Learning Kafka - Architecture, Concepts & Implementation

Kafka Concepts and Architecture 1
Kafka Concepts and Architecture 2
Kafka Concepts and Architecture 3
Kafka Sample Execution on Cloudera
Flume and Kafka Together

Learning Scala in Command Line Interface (REPL) & IntelliJ

Scala CLI/REPL on Windows & Cloudera with Mutable and Immutable Variables
Scala - Session 2 - Data Types Used & Applicable Functions
Scala - Session 3 - Range
Scala - Session 4 - For Loops
Scala While loops
Functions in Scala
Functions in Scala 2
Functions and Function Overloading in Scala
Object Oriented Programming in Scala using Classes & Objects
Scala Collections
Scala Input Output Files

Learning Spark - Architecture & Concepts

Spark Architecture
Spark Components, Lazy Executions, DAG, SparkSQL ,Performance Tuning etc
Spark - Shuffles ,Coalesce, Repartition & Shared Variables

Spark RDD - Implementations

Spark-shell execution Mode & RDD creation from HDFS & Local Files
Spark RDD Transformations - Filter, Sample, Union, Intersection, Distinct
Spark RDD Transformations - Map, FlatMap & Reduce
Spark RDD - Joining RDD
Spark RDD - Foreach & Splitting RDD String to Columns
Spark RDD - Removing Header from RDD, CountByKey, ReduceByKey, GroupByKey etc
Spark RDD - SortByKey, Coalesce, Repartition & Shared Variables
Spark RDD - Write the RDD to HDFS
Spark RDD in IntelliJ

Spark SQL, DataFrames & DataSets

Spark SQL - Executing SQL & storing in Dataframes
Spark SQL - Functions & Executions.mp4
DataFrames - Read Files in DataFrames & Implement different DataFrames Functions
DataFrames - Read Files in DataFrames & Implement different DF Functions 2
DataFrames - Read from File , Write to File and Convert to SparkSQL Format
DataFrames - Dataframe columns type Conversion
Datasets - Convert/Create Datasets from DataFrames
Spark - Writing & Executing RDD, DataFrames & Datasets in IntelliJ

IntelliJ & Spark-Submit

IntelliJ & Spark-Submit
Execute Spark Submit through Parameterized script
Spark-submit Config Options

Learning Spark Streaming - Concepts & Implementation

Spark Streaming Concepts & DStream
Spark Streaming - Word Count Example on Telnet
Spark Streaming - Twitter Word Count
Spark Streaming - Flume with Spark Streaming -Read files from HDFS and WordCount
Spark Streaming - Flume and Spark Together - Pull Based Module

Additional Information

SPARK - RDD VS DATAFRAME VS DATASETS
Spark - Catalyst Optimizer and Tungsten Engine
Spark - WebUI
Spark - Read JSON Files
Spark - Read & Write to Parquet & ORC Files

Project Scenarios

Overall Big Data Project Structure
Project Scenario - Bring Data from BI Database to Data Lake in Layer1
Project Scenario 2
Project Solution - Scenario 1 & 2
Project Scenario 3 - Bring Files from Local File System to HDFS in Data lake
Project Scenario 4 - Create Generic Jobs to read data from Data lake to layer 2
Project Scenario 5 - Use SparkSQL to read data from layer 2 and write to Layer 3
Project Scenario 5 - Solution
Project Scenario 6 - Merge MultipleFiles
Project Scenario 6 - Solution
Project Scenario 7 - Compare two Dataframes Col by Col - Scenario & Solutions

CCA 175 Practice Questions

Practice Test 1 : CCA 175 Spark & Hadoop Developer Exam
Practice Test 2 - CCA 175 Spark & Hadoop Certification

Reviews

Vinay
July 19, 2019
This is a nice course with a responsive instructor, most common set of topics are covered as part of this course. Apart from the same , the VM's provided are all ready to use which reduces setup time as this is the most common problem people face. I would recommend this course.
Puja
July 8, 2019
Excellent explanation of material, good pace, wide topic coverage great course!Sufficient hands on exercise provides clear understanding of concepts.Definitely recommend it to those who want to learn Scala and Spark.
Amit
January 24, 2019
I find this course very useful & direct to the point. Anshul has provided very good explanations on the topics covered. Also the topics that Anshul has chosen are precisely those which are mostly used in the industry. Overall, this is a great course for anybody who wants to become proficient in the big data technologies.

Charts

Price

Taming Big Data using Spark & Scala - Price chart

Rating

Taming Big Data using Spark & Scala - Ratings chart

Enrollment distribution

Taming Big Data using Spark & Scala - Distribution chart
2047289
udemy ID
11/25/2018
course created date
12/20/2020
course indexed date
Bot
course submited by