Pentaho for ETL & Data Integration Masterclass 2022 - PDI 9

Use Pentaho Data Integration tool for ETL & Data warehousing. Do ETL development using PDI 9.0 without coding background

Data & Analytics
9.5 hours
Jan 2022
last update
What you will learn

Understanding of the entire data integration process using PDI

Extracting data from all popular data sources including Excel, JSON, Zipped files, TXT files and even cloud storage

Cleaning the data using Pentaho Data Integration

Applying business rules on the data in PDI

Different types of Data transformations

Loading the data into different formats

Managing SQL database using PDI

Metadata Injection - a powerful tool offered by PDI

Understanding of the concepts of data marts and data warehouse


What is ETL?

The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.

Why Pentaho for ETL?

Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Pentaho has a user-friendly GUI which is easier and takes less time to learn. Pentaho is great for beginners. Also, Pentaho Data Integration (PDI) is an important skill in data analytics field.

How much can I earn?

In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools; Pentaho ETL is one of the most sought-after skills that recruiters look for. Demand for Pentaho Data Integration (PDI) techniques is increasing day after day.

Welcome to the course
Course Resources

Pentaho Data Integration (PDI) Installation and Setup

Setting up environment and installing PDI
Opening Spoon - The Graphical UI

A Simple ETL Demonstration

The example problem statement
Demonstration of a PDI transformation
Demonstration of a PDI Job

The ETL process: The practical part begins here

Data and the ETL process

DATA EXTRACTION: Extracting tabular data

Manually entering data into PDI
Inputting Data from a TXT (text) file
Input from multiple CSV files at the same time
Inputting Data from an Excel file
Extracting Data from Zipped files

DATA EXTRACTION: Extracting non-tabular data

Extracting from XML
Extracting from JSON

Extracting from an SQL table

Plan for importing sales Data
Creating Sales table in SQL
Extracting from an SQL table

Storing and Retrieving Data from Cloud storage

Storing Data on AWS S3
Reading data from AWS S3

Merging Data Streams

Concepts: Merging Data Streams
Sorted Merge Step

Data Cleansing

Introduction to Data Cleansing
Value Mapper Step
Replace in String Step
Fuzzy Match concepts
Fuzzy Match Step in PDI
Fuzzy Match Algorithms
Formula Step and changing data format
Common Data Cleaning Steps

Data Validation

Introduction to Data validation
Data_validation 1 - String-to-Int and integer range validations
Data validation 2 - Checking Reference Values using stream look-up
Data validation 3 - Order date < shipping date using calculator step
Common Data Validation steps

Error Handling

Correcting the errors and merging with main stream
Writing the errors to the log
Writing the errors to a separate file

Transformation and Analytics steps

Concatenating Address Fields
Data Aggregation using Group-by
Normalization and Denormalization
Number Range Step


July 7, 2022
Everything you need to start practising, let's see how the rest of the course turns out to help. A little later we face very useful processes, like merging, dedublication, fuzzy matching to correct typing errors etc. tbh. suprisingly useful course aside of violating best practises in attribute namings (Attributenames containing spaces or are named in a way which some SQL systems interpret as commands) a really good course.
June 24, 2022
Course materials can be improved with more information in the PDF. Also I faced issues while copy pasting queries from PDF. Please add a separate notepad for SQL queries.
June 13, 2022
This course is simply speechless! It covers a wide range of topics in very detailed explanations, and by just watching one time per video, I can understand the concepts and apply them to practical problems! I highly recommend this video to anyone who wants to be an expert.
June 12, 2022
Instruction are very clean. I was able to learn basic to more advanced content. I would like to request to include Pentaho full course including the dashboard and reporting. Thank you very much for the given knowledge. I don't rate 5 star because of lacking part of reporting and dashboards.
May 25, 2022
So far the course is exciting as I am under the impression that I can successfully complete this course.
May 21, 2022
As someone that works with PDI regularly, I still learned something new with this course. The only reason why this is not a full 5 stars was because I could not get the email function to work, no matter what I did. I hope it is possible to get those emails to inform of a failed job or fallout file created. My colleagues would love it, too.
May 20, 2022
Es un tema de mucho interés personal y me parece muy bueno, además no es la primera vez que sigo un curso de UDEMY. Y el material y los contenidos son muy detallados
May 13, 2022
Use very powerful tool to express basic process of ETL and make people know very simple basic idea of data engineer work.
May 5, 2022
The dowload instructions for software isnot clear, so people cannot get hands on, We will become silent spettors, please update the course. So people can make use of the course. I request udemy tem to please take look on the comments. So things are modified accordingly
May 3, 2022
A very nice introductory course! It teaches basic content and also some more advanced content. I would enjoy seeing an intermediate to advanced course with real world cases and maybe a project, who knows, maybe you could think about creating a new course for those who completed this one.
May 2, 2022
many basic information important to people which start this path, but I need more information about how to load data parallel and good practice to build good transformation, course is very monotonous
April 30, 2022
The course provided a really good overview and the instructor was very clear on their explanations and methods. I had however, hoped for more complex jobs, perhaps something around Unit Testing. In addition, I do not think there were enough questions or practical's to test my knowledge.
April 21, 2022
Buen contenido, pero me hubiese gustado profundizar más en parametrización de jobs, tareas de scripting más desafiantes, entre otros. El instructor habla perfectamente y se entiende absolutamente todo, a pesar de tener fuerte acento de la India. Buscaré otros cursos de Start-Tech Academy, lo recomiendo!
April 18, 2022
De opbouw van deze cursus over een ETL-tool is goed doordacht en de lesstof wordt telkens duidelijk uitgelegd. Gericht op niet-ruimtelijke data.
April 17, 2022
A person who desires to begin with the petaho or who desires to revised his/her understanding, this direction will let you with the proper content. Do practise while looking the videos and after 10 hours you'll be exact to head on your little complicated transformation and practise will make you capable of construct complicated transformation/jobs.



