Use Pentaho Data Integration tool for ETL & Data warehousing. Do ETL development using PDI 9.0 without coding background

9 hours


Nov 2020

Last Update
What you will learn

Understanding of the entire data integration process using PDI

Extracting data from all popular data sources including Excel, JSON, Zipped files, TXT files and even cloud storage

Cleaning the data using Pentaho Data Integration

Applying business rules on the data in PDI

Different types of Data transformations

Loading the data into different formats

Managing SQL database using PDI

Metadata Injection - a powerful tool offered by PDI

Understanding of the concepts of data marts and data warehouse


What is ETL?

The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.

Why Pentaho for ETL?

Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Its GUI is easier and takes less time to learn. Pentaho is great for beginners.

How much can I earn?

In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools.

Welcome to the course

Course Resources

Pentaho Data Integration (PDI) Installation and Setup

Setting up environment and installing PDI

Opening Spoon - The Graphical UI

A Simple ETL Demonstration

The example problem statement

Demonstration of a PDI transformation

Demonstration of a PDI Job

The ETL process: The practical part begins here

Data and the ETL process

DATA EXTRACTION: Extracting tabular data

Manually entering data into PDI

Inputting Data from a TXT (text) file

Input from multiple CSV files at the same time

Inputting Data from an Excel file

Extracting Data from Zipped files

DATA EXTRACTION: Extracting non-tabular data

Extracting from XML

Extracting from JSON

Extracting from an SQL table

Plan for importing sales Data

Creating Sales table in SQL

Extracting from an SQL table

Storing and Retrieving Data from Cloud storage

Storing Data on AWS S3

Reading data from AWS S3

Merging Data Streams

Concepts: Merging Data Streams

Sorted Merge Step

Data Cleansing

Introduction to Data Cleansing

Value Mapper Step

Replace in String Step

Fuzzy Match concepts

Fuzzy Match Step in PDI

Fuzzy Match Algorithms

Formula Step and changing data format

Common Data Cleaning Steps

Data Validation

Introduction to Data validation

Data_validation 1 - String-to-Int and integer range validations

Data validation 2 - Checking Reference Values using stream look-up

Data validation 3 - Order date < shipping date using calculator step

Common Data Validation steps

Error Handling

Correcting the errors and merging with main stream

Writing the errors to the log

Writing the errors to a separate file

Transformation and Analytics steps

Concatenating Address Fields

Data Aggregation using Group-by

Normalization and Denormalization

Number Range Step


Jonathan29 December 2020

Easy to follow lecture. Although i wish the discussion about regex should be applied in the transformation itself, example, use regex to filter filenames, etc. but other than that the course was great! Thanks!

Gabriel27 December 2020

Very good course, the instructor teach on a very simple way, and have good knowledge about the topics he is teaching. thanks.

Andrius21 December 2020

Good for beginners who never worked with this tool before, if you have worked with tool before it might be lacking new information.

Cristian12 December 2020

It's course were wonderful because I learned several things that I didn't know over Pentaho. Thank you very much!

Ali21 November 2020

Pentaho is a very powerful tool for implementing ETL pipelines but due to scarcity of documentation, it is quite daunting to start learning it, especially for people (like me) who are on ETL's learning curve. Not only does this course have a very well-defined structure but it also follows a step-by-step approach to facilitate beginners in developing a firm foundation of ETL processes. I would wholeheartedly recommend this course to students and professionals who wish to learn Pentaho in depth.

Pritha30 September 2020

Very clear and interesting course. It covers a lot of functionalities, a beginner can follow this course easily.

Giomar27 September 2020

He tenido muchos problema para poderme instlar el pentaho debido a que no encuentra el java a pesar de que lo he instalado ... Creo que se deberia facilitar un poco mas la explicacion frente a posibles problemas con la maquina virtual Java. Finalmente he tenido que crearme una maquina virtual Windows aparte para poder hacer el curso

Rafael9 September 2020

Curso muy completo para iniciarse en el mundo de las ETL en general y profundizar en Penthao en particular. Te otorga unos sólidos conocimientos para seguir desarrollándote de forma autónoma posteriormente.

Sajid7 September 2020

Overall a good course, however content can be better organized as some information and video editing is incomplete.

Khalid8 August 2020

Excellent course on ETL tool Pentaho which is very useful for data migration and data loading. Nice and easy explanation of various functions. Exactly what I was looking for.


