From 0 to 1: Hive for Processing Big Data

End-to-End Hive : HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes

4.44 (847 reviews)
Udemy
platform
English
language
Data Science
category
instructor
7,372
students
15.5 hours
content
Jan 2018
last update
$84.99
regular price

What you will learn

Write complex analytical queries on data in Hive and uncover insights

Leverage ideas of partitioning, bucketing to optimize queries in Hive

Customize hive with user defined functions in Java and Python

Understand what goes on under the hood of Hive with HDFS and MapReduce

Description

Prerequisites: Hive requires knowledge of SQL. The course includes and SQL primer at the end. Please do that first if you don't know SQL. You'll need to know Java if you want to follow the sections on custom functions. 

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data. 

 Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. 

Let's parse that 

A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. It's interface is like an old friend : the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. 

End-to-End: The course is an end-to-end guide for using Hive:  whether you are analyst who wants to process data  or an Engineer who needs to build custom functionality or optimize performance - everything you'll need is right here. New to SQL? No need to look elsewhere. The course  has a primer on all the basic SQL constructs, . 

Practical: Everything is taught using real-life examples, working queries and code . 

What's Covered: 

Analytical Processing: Joins, Subqueries, Views, Table Generating Functions, Explode, Lateral View, Windowing and more

Tuning Hive for better functionality: Partitioning, Bucketing, Join Optimizations, Map Side Joins, Indexes, Writing custom User Defined functions in Java. UDF, UDAF, GenericUDF, GenericUDTF,  Custom functions in Python,  Implementation of MapReduce for Select, Group by and Join

For SQL Newbies: SQL In Great Depth

Content

You, Us & This Course

You, Us & This Course

Introducing Hive

Hive: An Open-Source Data Warehouse
Hive and Hadoop
Hive vs Traditional Relational DBMS
HiveQL and SQL

Hadoop and Hive Install

Hadoop Install Modes
Hadoop Install Step 1 : Standalone Mode
Hadoop Install Step 2 : Pseudo-Distributed Mode
Hive install
Code-Along: Getting started

Hadoop and HDFS Overview

What is Hadoop?
HDFS or the Hadoop Distributed File System

Hive Basics

Primitive Datatypes
Collections_Arrays_Maps
Structs and Unions
Create Table
Insert Into Table
Insert into Table 2
Alter Table
HDFS
HDFS CLI - Interacting with HDFS
Code-Along: Create Table
Code-Along : Hive CLI

Built-in Functions

Three types of Hive functions
The Case-When statement, the Size function, the Cast function
The Explode function
Code-Along : Hive Built - in functions

Sub-Queries

Quirky Sub-Queries
More on subqueries: Exists and In
Inserting via subqueries
Code-Along : Use Subqueries to work with Collection Datatypes
Views

Partitioning

Indices
Partitioning Introduced
The Rationale for Partitioning
How Tables are Partitioned
Using Partitioned Tables
Dynamic Partitioning: Inserting data into partitioned tables
Code-Along : Partitioning

Bucketing

Introducing Bucketing
The Advantages of Bucketing
How Tables are Bucketed
Using Bucketed Tables
Sampling

Windowing

Windowing Introduced
Windowing - A Simple Example: Cumulative Sum
Windowing - A More Involved Example: Partitioning
Windowing - Special Aggregation Functions

Understanding MapReduce

The basic philosophy underlying MapReduce
MapReduce - Visualized and Explained
MapReduce - Digging a little deeper at every step

MapReduce logic for queries: Behind the scenes

MapReduce Overview: Basic Select-From-Where
MapReduce Overview: Group-By and Having
MapReduce Overview: Joins

Join Optimizations in Hive

Improving Join performance with tables of different sizes
The Where clause in Joins
The Left Semi Join
Map Side Joins: The Inner Join
Map Side Joins: The Left, Right and Full Outer Joins
Map Side Joins: The Bucketed Map Join and the Sorted Merge Join

Custom Functions in Python

Custom functions in Python
Code-Along : Custom Function in Python

Custom functions in Java

Introducing UDFs - you're not limited by what Hive offers
The Simple UDF: The standard function for primitive types
The Simple UDF: Java implementation for replacetext()
Generic UDFs, the Object Inspector and DeferredObjects
The Generic UDF: Java implementation for containsstring()
The UDAF: Custom aggregate functions can get pretty complex
The UDAF: Java implementation for max()
The UDAF: Java implementation for Standard Deviation
The Generic UDTF: Custom table generating functions
The Generic UDTF: Java implementation for namesplit()

SQL Primer - Select Statemets

Select Statements
Select Statements 2
Operator Functions

SQL Primer - Group By, Order By and Having

Aggregation Operators Introduced
The Group By Clause
More Group By Examples
Order By
Having

SQL Primer - Joins

Introduction to SQL Joins
Cross Joins aka Cartesian Joins
Inner Joins
Left Outer Joins
RIght, Full Outer Joins, Natural Joins, Self Joins

Appendix

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
Setting up a Virtual Linux Instance - For Windows Users

Screenshots

From 0 to 1: Hive for Processing Big Data - Screenshot_01From 0 to 1: Hive for Processing Big Data - Screenshot_02From 0 to 1: Hive for Processing Big Data - Screenshot_03From 0 to 1: Hive for Processing Big Data - Screenshot_04

Reviews

Narashimulu
August 6, 2023
I'm so glad I enrolled in this course. The instructor's deep knowledge and engaging teaching style made the content easy to grasp. best course.
Subhash
April 27, 2023
Amazing, since it is last updated few years back, you would need to figure out installation ways on newer versions of software packages. Otherwise, content is really awesome!
Adrián
March 14, 2021
This is a very well course about Hive, but I think it needs some organization because it seems that only has divided the topics into each integrate of the team. I think, first they have to teach about basic SELECT queries and go into depth with advanced topics.
Amarnadh
April 22, 2017
This course really awesome, I felt there should be some more videos for in depth knowledge on SERDEs, ANALYZE, pros and cons different storage types & also compression types at different levels.
Ravi
April 6, 2017
Trainer is not in so hurry and explains things in easy way to understand the actual flow behind this mechanism
Logasubramani
February 19, 2017
This course is nice.But many places I was not comfortable. It would have been awesome if this course talked literally about Hive instead of just talking about SQL.
Ian
February 2, 2017
This is a great course so far. It moves at a steady pace and provides enough detail to reduce the amount of times the viewer might need clarification. Hopefully, this course will help me close some gaps in my knowledge.
Avinash
January 26, 2017
The theoretical part of explaining the course is good. But as a student I am expecting more 'code along' chapters along with the course. This course is like reading a book. Please put more time in practical training classes. Thanks.
Vikranth
December 10, 2016
Very thoughtful explanation using real time scenarios and situations to explain the concepts precisely. Very systematic approach in teaching. I especially love the sarcasm in video 3.
Fidel
November 25, 2016
Great samples and explanation, however, it would be even better if the code along samples would match the samples on its specific section.
Bachir
October 31, 2016
course very clear and explanations are perfect. I only take half a star off because the first installation video where it says window in the title is a complete miss in my opinion. I recommend the course greatly
Denise
September 29, 2016
too long of an introduction.....should be crisper, the middle sections are really good and in depth knowledge is shared.
Sayooj
September 26, 2016
As a veteran database programmer, my primary focus while entering this course was to understand how HQL compares to SQL. This course touches upon that in every section I have covered so far and I feel fairly confident about taking on a new project that leverages Hive.
Srinivas
September 22, 2016
Would love to see more examples on UDF's and some complex real world scenarios. Also like to see some information about types of logic that is suitable to MAP and Reduce phases. May be an advanced course on HIVE.
Krishnakumar
September 3, 2016
The content is really good, it would be best if we use Virtualbox Sandbox (Hortonworks), which is a open source. This will take care of all the trouble you through in installing Hadoop and hive to your virtual machine.

Charts

Price

From 0 to 1: Hive for Processing Big Data - Price chart

Rating

From 0 to 1: Hive for Processing Big Data - Ratings chart

Enrollment distribution

From 0 to 1: Hive for Processing Big Data - Distribution chart

Related Topics

857298
udemy ID
5/23/2016
course created date
11/22/2019
course indexed date
Bot
course submited by