3.70 (279 reviews)
☑ Learn Big Data on AWS
☑ RedShift, Kinesis Streams, Kinesis Firehose, EMR, Machine Learning, Athena, AWS Glue, AWS IOT, DynamoDB, S3, AWS SnowBall, AWS Lambda
Full Length Practice Exam is Included
This course is a study guide for preparing for AWS Certified Big Data Specialty exam. Focus is on hands on learning.
Though this course does not guarantee that you will pass the exam you will learn lot of services and concepts required to pass the exam.
Even if you are not planning to take the exam you will learn a lot from studying the course material.
Cover all the exam domains.
Learn Collection, Storage , Processing , Analysis, Visualization, Data Security
Redshift and Flight Data Analysis
Kinesis Data Streams and Kinesis Firehose
EMR on AWS including Hive, Presto, Hadoop, MapReduce, HDFS, Spark
Amazon Machine Learning
S3 for BigData
Exam Blue Print
What is RedShift?
What is Columnar Database?
RedShift Nodes and Slices
Cost of RedShift and Node Types
Creating RedShift Cluster
Connecting to Cluster using Query Editor
Connecting to Cluster using SQLWorkBenchJ
COPY Command PART
Creating Tables and Loading data into Tables
Create Snapshot and Delete Cluster
Restore from Snapshot
Redshift Distribution Styles
RedShift Users and Groups
What is WLM?
RedShift Compression Types
Creating a Lambda Function
AWS Lambda Settings
Lambda Triggers - SNS to AWS Lambda
SNS to AWS Lambda Testing
Benefits of Kinesis Streams
Kinesis Create Stream, PutRecord, GetRecord Demo
Shards, Partition Key, Sequence Numbers
Kinesis Split Shards
Kinesis Producer Library Introduction
KPL Key Terms
KPL Code Review
KPL Demo Commands
Checkpointing using KCL
Kinesis Firehose Introduction
Kinesis Firehose Demo
Amazon Machine Learning
What is Machine Learning?
Getting familier with our Data using QuickSight
Real Time Prediction
Write Files to EMRFS Demo
Map Reduce Introduction
Map Reduce Example - Word Count
Map Reduce Demo on EMR
Hive Demo Part 1
Hive Demo Part 2
Hive Queries for Demo
SerDe Serializer DeSerializer
Hive and DynamoDB
EMR Cluster Life Cycle
Spark On EMR
Introduction to Spark
Hadoop vs Spark
What is Amazon Athena?
Query S3 Data using Athena
S3 Event Notifications
S3 Server Side Encryption - Part 1
S3 Server Side Encryption - Part 2
Componenets of DataPipeline
What is DynamoDB?
DynamoDB Partition Key and Sort Key
Scan and GetItem
GSI and Queries
DynamoDB Global Tables
What is QuickSight
What is SPICE?
DataSources Supported by QuickSight
Amazon Elasticsearch Service
What is Amazon Elasticsearch Service?
Elastisearch Use Cases
Master Node and Domain in ES
Autoscaling using SQS
What is Snowball?
Snowball Pricing (Optional)
Creating Trail and Viewing Events
IAM for CloudTrail
CloudTrail logs to CloudWatch
Crawling S3 Data using AWS Glue
Creating RedShift Cluster, Security Group and VPC Endpoint
Crawling RedShift Data
Crawling RedShift Data - Part 2
Copying Data from S3 to RedShift Using Glue Jobs
Copying Data from S3 to RedShift Using Glue Jobs - Part 2
Lab Part 1 - Setting up IOT
Lab Part 2 - Testing
Lab Part 3 - Creating Rule
Lab Part 4 - Sending Data to AWS IOT
ReInvent Video Links
AWS White Papers
More Hands On Labs
Flight Data Analysis - Part 1
Flight Data Analysis - Part 2
Flight Data Analysis - Part 3
Flight Data Analysis - Part 4
Flight Data Analysis - Part 5
Flight Data Analysis - Part 6
Flight Data Analysis - Part 7
RedShift Compression Demo Part 1
RedShift Compression Demo Part 2
RedShift Compression Demo Part 3
RedShift Compression Demo Part 4
Encrypting Data using KMS
So far so good - simple and easy to understand. I will update my rating after I'm done with the course
I did expect better material and coverage. I had done other courses so can say this does not match up the way it has organised its content.quality of "hands on" is also very poor. The video captures are not clear.
A lot of repetition, pulled info directly from AWS documents, no summaries for each part, not broken into AWS sections from the actual exam.
The explanation is nice. Lot of hands on demos. I did not have much experience in big data and this course is helping me understand the concepts immensely. Thank u once again for the nice course
video editing is poor. screen starts out blurry. lots of spelling errors/typos. overall quality is lacking
A good course if you want to start your career in Big Data using AWS. Each modules& videos are perfectly organized.
Seemed pretty detailed. I wasn't expecting quite so much programming. I hope it is required on the exam.
Have just started it and found there could be better way of explaining some of the concept like Row based and columnar database concept.
The course was OK, and Arpan knows his stuff. However, there are some serious issues that need to be corrected: 1) The biggest issue is that there is very little supporting documentation, and what there is strangely organized. For example, lesson 47 (Kinesis demo commands) is a single page with the commands used in lesson 46. While these are copy and pastable, they're too late - by the time you realize they're there, you've already completed lesson 46. Udemy course documentation is supposed to be in files attached to the lesson. Arpan clearly knows this, as a couple of the lessons (such as #42) have files attached. But the vast majority of them don't. Arpan mentions a GitHub repository a couple of times, but doesn't provide a link. He does have a GitHub page (https://github.com/arpansolanki?tab=repositories) but none of the repositories there appear to relate to this course. Also, his web site (http://www.arpansolanki.com/) is down. 2) Another major problem is that Section 5 of the course covers a deprecated service, AWS Machine Learning. This section should be updated to cover SageMaker. AWS no longer allows new users to create ML models, so it's impossible to follow the demo even if you want to. 3) The practice exam at the end has answers with the questions, in some cases on the same line as the question. This makes it impossible to read a question without seeing the answer, so the exam is useless for practice. This is a shame, as there was obviously a lot of effort involved to create it. Ideally, the exam should be interactive, with the solution displayed only after the candidate has selected an answer. At the very least, the answer key should be a separate document from the exam itself. These are the major issues. There are others throughout: 4) The Redshift copy command gets "invalid operation" unless the Workbench/J connection is set for autocommit. The course should mention this. 5) Lesson 42 (Kinesis Agent) should mention that it's necessary to set the Kinesis endpoint in agent.json 6) Lesson 58 (QuickSight) should mention that it's necessary to authorize QuickSight to access the specific S3 bucket containing the data. Also, the data rused by this demo (file bank-additional-full.csv, a download from UCI or Kaggle) must be massaged somewhat (change semicolons to commas, change "y"/"n" to 1/0) before use. 7) Lesson 73 (Write Files to EMRFS Demo) should mention that it's necessary to add SSH access to the security group (as well as where to get the needed data, which is tricky to find on the Internet). 8) Lesson 106 (GSI and Queries) should also discuss LSI and explain the differences between GSI and LSI. 9) Lesson 150ff (Flight Data Analysis) needs data and code. Some of it is on third-party site https://www.bogotobogo.com/DevOps/AWS/aws-qwiklabs-RedShift.php I know this is a long review. I hope Arpan takes it in the spirit intended. There's a lot of excellent material in this course and it could be a really good one if these structural issues can be corrected.
I bought this course when it was bundled with other Big Data certification course (from Frank and Stephane). The rationale for taking this was to go through the Exam Questions. However, the exam questions from this courses are very poor (either copy from official samples from AWS) or not explanatory enough. Also, the exam questions mention ANSWER right-away in last option, this is hurting the spirit of exam taker who wants to see and evaluate answers. e.g. Refer following question. The answers are provided right on the last line. And, no explanation! In fact the answer here are WRONG (A & D are correct for Even Distribution Style). Looks like these questions are copied from some other website...Google it ;-) An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.) A. When the tables are highly denormalized and do NOT participate in frequent joins. B. When data must be grouped based on a specific key on a defined slice. C. When data transfer between nodes must be eliminated. D. When a new table has been loaded and it is unclear how it will be joined to dimension. Answer: B,D ---- [Updates] See another question about Storage for Firehose. None of options are correct as KDF does NOT Store anything. It has buffer, but that it not storage. The question is however relevant to Kinesis Data Streams (KDS), as it "can store" by-default for 24 hours. How long does each Kinesis firehose delivery stream stores data records in case the delivery destination is unavailable? a) 12 hours b) 24 hours c) 48 hours d) 72 hours Answer: b Though I could not effectively use Exam Questions, I updated review from 2.5 to 3 rating just because of hands-on sections (e.g, IOT, Presto, HIVE-DynamoDB, etc)
The course contains most of what you need for the exam. Aprpan could work on pronunciation, but there are no parts that would not be understandable. And yes, I completed the exam today :-)
crisp short videos. beautifully explained. never get tired watching. i think its because he made all the videos with minimum duration covering each point in each video. I found this approach a best way to learn.
Mr. Arpan you have done a Great Job by explaining the whole topic with plenty of Examples and hands-on. Detailed explanation for IOT and Redshift was really useful and was very well explained. Thanks
brief overview of each aws components, neatly explained. needs notes or important summary points for each slide explained
this is a great course. thank u for the demo videos those are great. can't wait for new sections on Glue and Spark