4.10 (91 reviews)
☑ At the end of the course you'll understand Cloud Dataproc
☑ You'll also know how to craft machine learning projects at scale on GCP.
☑ You'll also know how to integrate dataproc with other core services like BigQuery
☑ Additionally, you'll learn how to migrate on premise Hadoop and Spark jobs to Cloud Dataproc.
Welcome to Managing Big Data on Google's Cloud Platform. This is the second course in a series of courses designed to help you attain the coveted Google Certified Data Engineer.
Additionally, the series of courses is going to show you the role of the data engineer on the Google Cloud Platform.
At this juncture the Google Certified Data Engineer is the only real world certification for data and machine learning engineers.
NOTE: This is NOT a course on Big Data. This is a course on a specific cloud service called Google Cloud Dataproc. The course was designed to be part of a series for those who want to become data engineers on Google's Cloud Platform.
This course is all about Google's Cloud and migrating on-premise Hadoop jobs to GCP. In reality, Big Data is simply about unstructured data. There are two core types of data in the real world. The first is structured data, this is the kind of data found in a relational database. The second is unstructured, this is a file sitting on a file system. Approximately 90% of all data in the enterprise is unstructured and our job is to give it structure.
Why do we want to give it structure? We want to give is structure so we can analyze it. Recall that 99% of all applied machine learning is supervised learning. That simply means we have a data set and we point our machine learning models at that data set in order to gain insight into that data.
In the course we will spend much of the time working in Cloud Dataproc. This is Google’s managed Hadoop and Spark platform.
Recall the end goal of big data is to get that data into a state where it can be analyzed and modeled. Therefore, we are also going to cover how to work on machine learning projects with big data at scale.
Please keep in mind this course alone will not give you the knowledge and skills to pass the exam. The course will provide you with the big data knowledge you need for working with Cloud Dataproc and for moving existing projects to the Google Cloud Platform.
*Five Reasons to take this Course.*
1) The Top Job in the World
The data engineer role is the single most needed role in the world. Many believe that it's the data scientist but several studies have broken down the job descriptions and the most needed position is that of the data engineer.
2) Google's the World Leader in Data
Amazon's AWS is the most used cloud and Azure has the best UI but no cloud vendor in the world understands data like Google. They are the world leader in open sources artificial intelligence. You can't be the leader in AI without being the leader in data.
3) 90% of all Organizational Data is Unstructured
The study of big data is the study of unstructured data. As the data in companies grows most will need to scale to unprecedented level. Without a significant investment in infrastructure and talent this won't be possible without the cloud.
4) The Data Revolution is Now
We are in a data revolution. Data used to be viewed as a simple necessity and lower on the totem pole. Now it is more widely recognized as the source of truth. As we move into more complex systems of data management, the role of the data engineer becomes extremely important as a bridge between the DBA and the data consumer. Beyond the ubiquitous spreadsheet, graduating from RDBMS (which will always have a place in the data stack), we now work with NoSQL and Big Data technologies.
5) Data is Foundation
Data engineers are the plumbers building a data pipeline, while data scientists are the painters and storytellers giving meaning to an otherwise static entity. Simply put, data engineers clean, prepare and optimize data for consumption. Once the data becomes useful, data scientists can perform a variety of analyses and visualization techniques to truly understand the data, and eventually, tell a story from the data.
Thank you for your interest in Managing Big Data on Google's Cloud Platform and we will see you in the course!!
Is this Course for You?
Instructor Course Q&A
Why Cloud Dataproc
Why Use GCP for Big Data?
On-Premise Hadoop Build
Scaling up or Scaling Out
Zones and Regions
Separating Compute and Storage
Cloud Dataproc Architecture
Cloud Dataproc in Action
Create Cluster Screen
Create Dataproc Cluster in GCP Console
Create a Cluster using the Shell
The Three Dataproc Configurations
Using Preemption on Cloud Dataproc
How GCP Handles Preemption
Image Version Options
Creating a Custom Image
3 Steps to Install Additional Software on Clusters
The Submit Jobs Screen
Submitting Spark Job - Console
Submitting Spark Job - Google Cloud Shell
Submitting PySpark Job - SSH
Moving from On-Premise to Google Cloud Dataproc
Python and Scala Code Reference Change
You're the Data Engineer
White Boarding: Difference between On-prem and Cloud Dataproc
White Boarding: Moving Jobs to GCP
White Boarding: Data Near Clusters
White Boarding: Defining Preemptibles
White Boarding: On-Premise Architecture to GCP
White Boarding: Add Software to Nodes
The course gives an overview about GCP Architecture for DataProc , but there are some details that is not covered in details for thos that work with Haoop . It is not just a matter of architecture change , but there are also changes reagrds the data consumption and data ingestion . The course just cover the process pipeline.
Great and well planned course from start to finish - especially the last section on Whiteboarding. I just wish it had been a little longer. Thank you for a great course.
10 % education, 90% advertisement; which would be fine, have I not payed for the privelage of listen to it
I thought the class was very informative on how to set up and manage Dataproc for Hadoop clusters. As someone without a lot of exposure to Hadoop and Spark, I only wish there had been a little more detailed and hands on information on the types of jobs and things you can actually do with it.
this is going through really basic info that any even semi-technical Hadoop engineer knows. It isn't very valuable so far.
A good intro to the topic. But to pass the exam you need to dive much deeper to every single piece of detail, including troubleshooting and choosing the best option out of several possible options.
O curso é muito bom em apresentar tópicos para o estudante. Porém, como na maioria dos cursos da Udemy, as explicações são rasas e na maior parte do tempo é necessário ir atrás dos conceitos de forma mais aprofundada.
Not a good value. Superficial coverage of a small number of topics. Looney Corn videos are a much better value. Much more content.
O curso é bem suscinto e abrangente, só acho que falata mais material impresso (pdf) com mais detalhamento para estudo posterior
Nice explanations, but slower talking would be much better. In addition, i expected more visuals, graphs or written important points during the lessons. Sometimes, difficult to follow. Summary at the end of each chapter was a plus point.
Los temas son de interés y bien explicados, el único problema que he detectado es que algunos subtitulos esta incorrectos con respecto a la exposición.
The courses are explained with a lot of detail that includes the history of "why" the tool was created to the "present" of how it is used. Emphasizing the concepts with images scaled the learning curve and made understanding the concepts much easier. As well, the rewind feature is helpful. Finally, the summary page makes the learning comprehensive.
Good continuation to GCP Overview. Benefits of moving from on-premise Hadoop to GCP is well explained
I found the lectures are very easy to understand the Cloud Dataproc, which is a hadoop managed service in GCP.
This is an amazing and first ever course on Udemy Big data on Google Cloud Platform that will support all your google Cloud needs.Mike is very good and talented instructor presented this quality course. Lastly Thank you Mike West for this wonderful course. You are the best and this course is worth any price.