3.65 (32 reviews)
☑ You'll understand the core structures of Apache Beam.
☑ You'll know how to author a simple streaming application on Google's Cloud.
☑ You'll be well versed in all the vernacular of streaming.
☑ You'll be ready to handle all the questions on the Google Certified Data Engineering exam that are related to Cloud Dataflow.
Review from course in this series:
"I like the detail, especially highlighting the specifics of the test. The detail makes this course worth the investment including the summary at the end and the quizzes that test my knowledge." -- Valentina Kibuyaga
Welcome to Streaming Analytics on Google Cloud Platform This is the Fifth and final course in a series of courses designed to help you attain the coveted Google Certified Data Engineer.
Additionally, the series of courses is going to show you the role of the data engineer on the Google Cloud Platform.
While this is a short course the topic matter is dense and while you won't have to author is Java Pipelines for the exam you will need to know a lot about how they are created and executed.
At this juncture, the Google Certified Data Engineer is the only real world certification for data and machine learning engineers.
NOTE: This is NOT a course on programming Apache Beam Pipelines. This is a very targeted course on understanding how Apache Beam and Cloud Dataflow provide us with an infrastructure to build pipelines for streaming data. The course will provide the learner with the nomenclature and process understanding they'll need to pass the Certified Data Engineering Exam.
Streaming data processing is a big deal in big data these days, and for good reasons. Businesses crave ever more timely data, and switching to streaming is a good way to achieve lower latency.
The massive, unbounded data sets that are increasingly common in modern business are more easily tamed using a system designed for such never-ending volumes of data.
Processing data as it arrives spreads workloads out more evenly over time, yielding more consistent and predictable consumption of resources.
In Google Cloud Platform the main tool we use for building these pipelines Cloud Dataflow. The product itself is a fusion of the code written by Google developers and that of the Apache foundation. The project that came out of that business cohabitation is Apache Beam.
Apache Beam (Batch + strEAM) is a model and set of APIs for doing both batch and streaming data processing. It was open-sourced by Google (with Cloudera and PayPal) in 2016 via an Apache incubator project.
In this course, we are going to learn about Apache Beam and Cloud Dataflow. While the course is an entry level course streaming will be new to many. Like most of my other courses in this series, I’ll attempt to break down more complicated topics pictorially.
*Five Reasons to take this Course.*
1) You Want to be a Data Engineer
It's the number one job in the world. (not just within the computer space) The growth potential career wise is second to none. You want the freedom to move anywhere you'd like. You want to be compensated for your efforts. You want to be able to work remotely. The list of benefits goes on.
2) The Google Certified Data Engineer
Google is always ahead of the game. If you were to look back at a timeline of their accomplishments in the data space you might believe they have a crystal ball. They've been a decade ahead of everyone. Now, they are the first and the only cloud vendor to have a data engineering certification. With their track record I'll go with Google.
3) The Growth of Data is Insane
Ninety percent of all the world's data has been created in the last two years. Business around the world generate approximately 450 billions transactions a day. The amount of data collected by all organizations is approximately 2.5 Exabytes a day. That number doubles every month.
4) Apache Beam in Plain English
Apache Beam pipelines require basic programming skills. The Google Certified Data Engineering exam will require you are able to identify the parts of a Beam Pipeline in addition to understanding some of the vernacular and nuances behind streaming data.
5) You want to be ahead of the Curve
The data engineer role is new. While you’re learning, building your skills and becoming certified you are also the first to be part of this burgeoning field. You know that the first to be certified means the first to be hired and first to receive the top compensation package.
Thank you for your interest in Streaming Analytics on Google Cloud Platform and we will see you in the course!!
Is this Course for You?
What is Streaming?
The 3 Vs of Big Data
The Beam Pipeline
Definition and History
Beam Object Model
Pipeline Object Review
Object Review Answer Key
Event Time and Processing Time
The Mobile App
Handling Data Tensions
FlumeJava and Batch Patterns
The Dataflow Model
Cloud Dataflow: The SDK and the Runner
The 4 Core Questions of Dataflow
Lab: Building a Dataflow Pipeline
Dataflow Job Monitoring UI
Stackdriver and Dataflow
Lab: Monitoring Dataflow
UPDATING THE COURSE. I was not able to launch a Dataflow. It produces an error while mvn compile exec:java \-Dexec.mainClass=com.example.WordCount \n-Dexec.args="--project=psyched-choir-315112 \n--stagingLocation=gs://some-bucket/daryna-test/ \n--output=gs://some-bucket/daryna-test/ \n--runner=DataflowRunner \n--jobName=dataflow-intro" because SDK version is too old and no longer supported
The course is very theoretical and Labs doesn't explain what code is doing and why each step is important.
The course doesn't go too much into the detail of how to setup the pipeline. The codes for fetching data and transforming it was not provided. It's only for people who want to have high-level understanding of the concept. WASTE OF MONEY !!