4.36 (197 reviews)
☑ Understanding of the entire data integration process using PDI
☑ Extracting data from all popular data sources including Excel, JSON, Zipped files, TXT files and even cloud storage
☑ Cleaning the data using Pentaho Data Integration
☑ Applying business rules on the data in PDI
☑ Different types of Data transformations
☑ Loading the data into different formats
☑ Managing SQL database using PDI
☑ Metadata Injection - a powerful tool offered by PDI
☑ Understanding of the concepts of data marts and data warehouse
What is ETL?
The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.
Why Pentaho for ETL?
Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Its GUI is easier and takes less time to learn. Pentaho is great for beginners.
How much can I earn?
In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools.
What makes us qualified to teach you?
The course is taught by Abhishek and Pukhraj. Instructors of the course have been teaching Data Science and Machine Learning for over a decade.
We are also the creators of some of the most popular online courses - with over 150,000 enrollments and thousands of 5-star reviews like these ones:
I had an awesome moment taking this course. It broaden my knowledge more on the power use of Excel as an analytical tools. Kudos to the instructor! - Sikiru
Very insightful, learning very nifty tricks and enough detail to make it stick in your mind. - Armand
Teaching our students is our job and we are committed to it. If you have any questions about the course content, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.
Download Practice files, take Quizzes, and complete Assignments
With each lecture, there is a practice sheet attached for you to follow along. You can also take quizzes to check your understanding of concepts. Each section contains a practice assignment for you to practically implement your learning. Solution to Assignment is also shared so that you can review your performance.
By the end of this course, your confidence in using Excel will soar. You'll have a thorough understanding of how to use Microsoft Excel for study or as a career opportunity.
Go ahead and click the enroll button, and I'll see you in lesson 1!
Welcome to the course
Pentaho Data Integration (PDI) Installation and Setup
Setting up environment and installing PDI
Opening Spoon - The Graphical UI
A Simple ETL Demonstration
The example problem statement
Demonstration of a PDI transformation
Demonstration of a PDI Job
The ETL process: The practical part begins here
Data and the ETL process
DATA EXTRACTION: Extracting tabular data
Manually entering data into PDI
Inputting Data from a TXT (text) file
Input from multiple CSV files at the same time
Inputting Data from an Excel file
Extracting Data from Zipped files
DATA EXTRACTION: Extracting non-tabular data
Extracting from XML
Extracting from JSON
Extracting from an SQL table
Plan for importing sales Data
Creating Sales table in SQL
Extracting from an SQL table
Storing and Retrieving Data from Cloud storage
Storing Data on AWS S3
Reading data from AWS S3
Merging Data Streams
Concepts: Merging Data Streams
Sorted Merge Step
Introduction to Data Cleansing
Value Mapper Step
Replace in String Step
Fuzzy Match concepts
Fuzzy Match Step in PDI
Fuzzy Match Algorithms
Formula Step and changing data format
Common Data Cleaning Steps
Introduction to Data validation
Data_validation 1 - String-to-Int and integer range validations
Data validation 2 - Checking Reference Values using stream look-up
Data validation 3 - Order date < shipping date using calculator step
Common Data Validation steps
Correcting the errors and merging with main stream
Writing the errors to the log
Writing the errors to a separate file
Transformation and Analytics steps
Concatenating Address Fields
Data Aggregation using Group-by
Normalization and Denormalization
Number Range Step
Easy to follow lecture. Although i wish the discussion about regex should be applied in the transformation itself, example, use regex to filter filenames, etc. but other than that the course was great! Thanks!
Very good course, the instructor teach on a very simple way, and have good knowledge about the topics he is teaching. thanks.
Good for beginners who never worked with this tool before, if you have worked with tool before it might be lacking new information.
It's course were wonderful because I learned several things that I didn't know over Pentaho. Thank you very much!
Pentaho is a very powerful tool for implementing ETL pipelines but due to scarcity of documentation, it is quite daunting to start learning it, especially for people (like me) who are on ETL's learning curve. Not only does this course have a very well-defined structure but it also follows a step-by-step approach to facilitate beginners in developing a firm foundation of ETL processes. I would wholeheartedly recommend this course to students and professionals who wish to learn Pentaho in depth.
Very clear and interesting course. It covers a lot of functionalities, a beginner can follow this course easily.
He tenido muchos problema para poderme instlar el pentaho debido a que no encuentra el java a pesar de que lo he instalado ... Creo que se deberia facilitar un poco mas la explicacion frente a posibles problemas con la maquina virtual Java. Finalmente he tenido que crearme una maquina virtual Windows aparte para poder hacer el curso
Curso muy completo para iniciarse en el mundo de las ETL en general y profundizar en Penthao en particular. Te otorga unos sólidos conocimientos para seguir desarrollándote de forma autónoma posteriormente.
Overall a good course, however content can be better organized as some information and video editing is incomplete.
Excellent course on ETL tool Pentaho which is very useful for data migration and data loading. Nice and easy explanation of various functions. Exactly what I was looking for.