IT Certification


Big Data on AWS

Learn Big Data on AWS with Hands On Learning

3.70 (279 reviews)

Big Data on AWS


11.5 hours


May 2019

Last Update
Regular Price

What you will learn

Learn Big Data on AWS

RedShift, Kinesis Streams, Kinesis Firehose, EMR, Machine Learning, Athena, AWS Glue, AWS IOT, DynamoDB, S3, AWS SnowBall, AWS Lambda


Full Length Practice Exam is Included

This course is a study guide for preparing for AWS Certified Big Data Specialty exam. Focus is on hands on learning.

Though this course does not guarantee that you will pass the exam you will learn lot of services and concepts required to pass the exam.

Even if you are not planning to take the exam you will learn a lot from studying the course material.

Cover all the exam domains.

Learn Collection, Storage , Processing , Analysis, Visualization, Data Security

Redshift and Flight Data Analysis

Kinesis Data Streams and Kinesis Firehose

EMR on AWS including Hive, Presto, Hadoop, MapReduce, HDFS, Spark


Amazon Machine Learning

Amazon QuickSight

AWS Glue


AWS Lambda

AWS DataPipeline

Amazon DynamoDB

AWS Snowball

S3 for BigData

Amazon Elasticsearch





Exam Blue Print




What is RedShift?

What is Columnar Database?

RedShift Nodes and Slices

Cost of RedShift and Node Types

Creating RedShift Cluster

Connecting to Cluster using Query Editor


Connecting to Cluster using SQLWorkBenchJ

COPY Command


Creating Tables and Loading data into Tables

Splitting Files

Verifying Data

Create Snapshot and Delete Cluster

Restore from Snapshot

Sharing Snapshot

Redshift Distribution Styles

RedShift Users and Groups

What is WLM?

WLM Demo

RedShift Views

RedShift Compression Types

Vaccum Process

RedShift DataTypes

AWS Lambda

Creating a Lambda Function

AWS Lambda Settings

Lambda Triggers - SNS to AWS Lambda

SNS to AWS Lambda Testing




Benefits of Kinesis Streams

Use Cases

Kinesis Create Stream, PutRecord, GetRecord Demo

Shards, Partition Key, Sequence Numbers

Shard Scaling

Kinesis Split Shards

Merge Shards

Kinesis Agent

Kinesis Producer Library Introduction

KPL Key Terms

KPL Code Review

KPL Demo

KPL Demo Commands

KCL Introduction

KCL Architecture

Checkpointing using KCL

KCL Demo

Kinesis Firehose Introduction

Kinesis Firehose Demo

Amazon Machine Learning


What is Machine Learning?

Supervised Learning

Unsupervised Learning

Getting familier with our Data using QuickSight

Create DataSource

Evaluating Model


Real Time Prediction

Batch Prediction

Prediction Data

Service Limits

Deleting Objects

Exam Tips



Core Hadoop

HDFS Overview


HDFS Write

Write Files to EMRFS Demo

Map Reduce Introduction

Map Reduce Example - Word Count

Map Reduce Demo on EMR

Hive Introduction

Hive Demo Part 1

Hive Demo Part 2

Hive Queries for Demo

SerDe Serializer DeSerializer

Presto Introduction

Presto Demo

Hive and DynamoDB

EMR Architecture

EMR Cluster Life Cycle

EMR Autoscaling

Spark On EMR

Introduction to Spark

Hadoop vs Spark

Spark Ecosystem



What is Amazon Athena?

Query S3 Data using Athena

Athena Partitions

Athena Queries


S3 Event Notifications

S3 Server Side Encryption - Part 1

S3 Server Side Encryption - Part 2


Create DataPipeline

Componenets of DataPipeline



What is DynamoDB?

DynamoDB Partition Key and Sort Key

DynamoDB Operations

Scan and GetItem

GSI and Queries

DynamoDB Capacities

DynamoDB Streams

Data Types

DynamoDB Global Tables

Exam Tips



What is QuickSight


Preparing DataSet

Creating Analysis

Modify Visuals

Create Dashboard

What is SPICE?

DataSources Supported by QuickSight

Exam Tips

Amazon Elasticsearch Service

What is Amazon Elasticsearch Service?

Elastisearch Use Cases

Master Node and Domain in ES

Elasticsearch Demo


SQS Introduction

Autoscaling using SQS



What is Snowball?

Create Job

Transfer Data

Snowball Pricing (Optional)



Creating Trail and Viewing Events

IAM for CloudTrail

CloudTrail logs to CloudWatch

AWS Glue

Crawling S3 Data using AWS Glue

Creating RedShift Cluster, Security Group and VPC Endpoint

Crawling RedShift Data

Crawling RedShift Data - Part 2

Copying Data from S3 to RedShift Using Glue Jobs

Copying Data from S3 to RedShift Using Glue Jobs - Part 2


Lab Part 1 - Setting up IOT

Lab Part 2 - Testing

Lab Part 3 - Creating Rule

Lab Part 4 - Sending Data to AWS IOT

Additional Materials

Practice Exam

ReInvent Video Links

AWS White Papers

More Hands On Labs

Flight Data Analysis - Part 1

Flight Data Analysis - Part 2

Flight Data Analysis - Part 3

Flight Data Analysis - Part 4

Flight Data Analysis - Part 5

Flight Data Analysis - Part 6

Flight Data Analysis - Part 7

RedShift Compression Demo Part 1

RedShift Compression Demo Part 2

RedShift Compression Demo Part 3

RedShift Compression Demo Part 4

KMS Demo

Encrypting Data using KMS


Tetyana8 September 2020

So far so good - simple and easy to understand. I will update my rating after I'm done with the course

Laxmi26 March 2020

I did expect better material and coverage. I had done other courses so can say this does not match up the way it has organised its content.quality of "hands on" is also very poor. The video captures are not clear.

JP25 February 2020

A lot of repetition, pulled info directly from AWS documents, no summaries for each part, not broken into AWS sections from the actual exam.

K11 January 2020

The explanation is nice. Lot of hands on demos. I did not have much experience in big data and this course is helping me understand the concepts immensely. Thank u once again for the nice course

Justin15 December 2019

video editing is poor. screen starts out blurry. lots of spelling errors/typos. overall quality is lacking

Sarafudheen14 November 2019

A good course if you want to start your career in Big Data using AWS. Each modules& videos are perfectly organized.

Ken5 November 2019

Seemed pretty detailed. I wasn't expecting quite so much programming. I hope it is required on the exam.

Madhu23 October 2019

Have just started it and found there could be better way of explaining some of the concept like Row based and columnar database concept.

David1 October 2019

The course was OK, and Arpan knows his stuff. However, there are some serious issues that need to be corrected: 1) The biggest issue is that there is very little supporting documentation, and what there is strangely organized. For example, lesson 47 (Kinesis demo commands) is a single page with the commands used in lesson 46. While these are copy and pastable, they're too late - by the time you realize they're there, you've already completed lesson 46. Udemy course documentation is supposed to be in files attached to the lesson. Arpan clearly knows this, as a couple of the lessons (such as #42) have files attached. But the vast majority of them don't. Arpan mentions a GitHub repository a couple of times, but doesn't provide a link. He does have a GitHub page (https://github.com/arpansolanki?tab=repositories) but none of the repositories there appear to relate to this course. Also, his web site (http://www.arpansolanki.com/) is down. 2) Another major problem is that Section 5 of the course covers a deprecated service, AWS Machine Learning. This section should be updated to cover SageMaker. AWS no longer allows new users to create ML models, so it's impossible to follow the demo even if you want to. 3) The practice exam at the end has answers with the questions, in some cases on the same line as the question. This makes it impossible to read a question without seeing the answer, so the exam is useless for practice. This is a shame, as there was obviously a lot of effort involved to create it. Ideally, the exam should be interactive, with the solution displayed only after the candidate has selected an answer. At the very least, the answer key should be a separate document from the exam itself. These are the major issues. There are others throughout: 4) The Redshift copy command gets "invalid operation" unless the Workbench/J connection is set for autocommit. The course should mention this. 5) Lesson 42 (Kinesis Agent) should mention that it's necessary to set the Kinesis endpoint in agent.json 6) Lesson 58 (QuickSight) should mention that it's necessary to authorize QuickSight to access the specific S3 bucket containing the data. Also, the data rused by this demo (file bank-additional-full.csv, a download from UCI or Kaggle) must be massaged somewhat (change semicolons to commas, change "y"/"n" to 1/0) before use. 7) Lesson 73 (Write Files to EMRFS Demo) should mention that it's necessary to add SSH access to the security group (as well as where to get the needed data, which is tricky to find on the Internet). 8) Lesson 106 (GSI and Queries) should also discuss LSI and explain the differences between GSI and LSI. 9) Lesson 150ff (Flight Data Analysis) needs data and code. Some of it is on third-party site https://www.bogotobogo.com/DevOps/AWS/aws-qwiklabs-RedShift.php I know this is a long review. I hope Arpan takes it in the spirit intended. There's a lot of excellent material in this course and it could be a really good one if these structural issues can be corrected.

Larry28 September 2019

I bought this course when it was bundled with other Big Data certification course (from Frank and Stephane). The rationale for taking this was to go through the Exam Questions. However, the exam questions from this courses are very poor (either copy from official samples from AWS) or not explanatory enough. Also, the exam questions mention ANSWER right-away in last option, this is hurting the spirit of exam taker who wants to see and evaluate answers. e.g. Refer following question. The answers are provided right on the last line. And, no explanation! In fact the answer here are WRONG (A & D are correct for Even Distribution Style). Looks like these questions are copied from some other website...Google it ;-) An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.) A. When the tables are highly denormalized and do NOT participate in frequent joins. B. When data must be grouped based on a specific key on a defined slice. C. When data transfer between nodes must be eliminated. D. When a new table has been loaded and it is unclear how it will be joined to dimension. Answer: B,D ---- [Updates] See another question about Storage for Firehose. None of options are correct as KDF does NOT Store anything. It has buffer, but that it not storage. The question is however relevant to Kinesis Data Streams (KDS), as it "can store" by-default for 24 hours. How long does each Kinesis firehose delivery stream stores data records in case the delivery destination is unavailable? a) 12 hours b) 24 hours c) 48 hours d) 72 hours Answer: b Though I could not effectively use Exam Questions, I updated review from 2.5 to 3 rating just because of hands-on sections (e.g, IOT, Presto, HIVE-DynamoDB, etc)

Pavel23 August 2019

The course contains most of what you need for the exam. Aprpan could work on pronunciation, but there are no parts that would not be understandable. And yes, I completed the exam today :-)

Anu10 August 2019

crisp short videos. beautifully explained. never get tired watching. i think its because he made all the videos with minimum duration covering each point in each video. I found this approach a best way to learn.

Naveen3 August 2019

Mr. Arpan you have done a Great Job by explaining the whole topic with plenty of Examples and hands-on. Detailed explanation for IOT and Redshift was really useful and was very well explained. Thanks

Keerthan4 April 2019

brief overview of each aws components, neatly explained. needs notes or important summary points for each slide explained

Todd29 March 2019

this is a great course. thank u for the demo videos those are great. can't wait for new sections on Glue and Spark


Udemy ID


Course created date


Course Indexed date
Course Submitted by