Pentaho Data Integration (PDI) - ETL data processing module (advanced level) Training Course
Pentaho is a product distributed under an Open Source license that provides a full range of business solutions in the area of Business Intelligence, including reporting, data analysis, dashboards and data integration.
Thanks to the platform Pentaho, individual business units gain access to a wide range of valuable information, ranging from sales and profitability analyzes of individual customers or products, through reporting for the needs of HR and financial departments, to providing aggregate information for the needs of senior management.
The training is addressed to programmers, architects and application administrators who want to create or maintain data extraction, transformation and loading (ETL) processes using Pentaho Data Integration (PDI).
After the training, the participant will acquire skills related to:
- installation and configuration of the environment Pentaho,
- designing, implementing, monitoring, launching and tuning ETL processes,
- working with data in PDI,
- entering various types of data and various data formats
- filtering, grouping and combining data
- task scheduling,
- starting transformation,
- creating clusters.
The course is designed to take the participant from basic to advanced level.
Course Outline
The first day
- Installation and configuration Pentaho Data Integration
- Creating a repository
- Get to know the Spoon user interface
- Creating transformations
- Reading and writing to a file
- Working with databases (query generator SQL)
- Filtering, grouping and combining data
- Working with XLS
Day two
- Creating tasks
- Defining parameters and variables
- Data versioning (support for validity periods)
- Database transactionality in transformations
- Usage JavaScript
- Mapping transformations
- Data type conversion and column order in the stream
- Login processing
Day third
- Running transformations and tasks from the command line (kitchen.bat, pan.bat)
- Task scheduling
- Running transformations in parallel
- Remote startup (carte.bat)
- Clustering and partitioning
- Versioning and group work
Open Training Courses require 5+ participants.
Pentaho Data Integration (PDI) - ETL data processing module (advanced level) Training Course - Booking
Pentaho Data Integration (PDI) - ETL data processing module (advanced level) Training Course - Enquiry
Pentaho Data Integration (PDI) - ETL data processing module (advanced level) - Consultancy Enquiry
Consultancy Enquiry
Testimonials (2)
That the trainer had a lot of trouble with us, since we kept interrupting with questions all the time.
Oleksandr Muliar - Bank Gospodarstwa Krajowego
Course - Pentaho Data Integration (PDI) - moduł do przetwarzania danych ETL (poziom zaawansowany)
Machine Translated
Prepared material. Full professionalism. Very good contact with the trainer. Full engagement and openness to changing the planned training format (very valuable open discussions on the topics we prepared)
Kamil Trebacz - Bank Gospodarstwa Krajowego
Course - Pentaho Data Integration (PDI) - moduł do przetwarzania danych ETL (poziom zaawansowany)
Machine Translated
Provisional Courses
Related Courses
Cluster Analysis with R and SAS
14 HoursThis instructor-led, live training in Poland (online or onsite) is aimed at data analysts who wish to program with R in SAS for cluster analysis.
By the end of this training, participants will be able to:
- Use cluster analysis for data mining
- Master R syntax for clustering solutions.
- Implement hierarchical and non-hierarchical clustering.
- Make data-driven decisions to help to improve business operations.
From Data to Decision with Big Data and Predictive Analytics
21 HoursAudience
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
Delivery Mode
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run, so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
Data Mining and Analysis
28 HoursObjective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Data Mining
21 HoursCourse can be provided with any tools, including free open-source data mining software and applications
Data Mining with Python
14 HoursThis instructor-led, live training (online or onsite) is aimed at data analysts and data scientists who wish to implement more advanced data analytics techniques for data mining using Python.
By the end of this training, participants will be able to:
- Understand important areas of data mining, including association rule mining, text sentiment analysis, automatic text summarization, and data anomaly detection.
- Compare and implement various strategies for solving real-world data mining problems.
- Understand and interpret the results.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Data Mining with R
14 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Data Vault: Building a Scalable Data Warehouse
28 HoursIn this instructor-led, live training in Poland, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Data Visualization
28 HoursThis course is intended for engineers and decision makers working in data mining and knoweldge discovery.
You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information.
Data Mining with Excel
14 HoursThis instructor-led, live training in Poland (online or onsite) is aimed at data scientists who wish to use Excel for data mining.
- By the end of this training, participants will be able to:
- Explore data with Excel to perform data mining and analysis.
- Use Microsoft algorithms for data mining.
- Understand concepts in Excel data mining.
Data Mining with Weka
14 HoursThis instructor-led, live training in Poland (online or onsite) is aimed at beginner to intermediate-level data analysts and data scientists who wish to use Weka to perform data mining tasks.
By the end of this training, participants will be able to:
- Install and configure Weka.
- Understand the Weka environment and workbench.
- Perform data mining tasks using Weka.
Data Mining & Machine Learning with R
14 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Data Science for Big Data Analytics
35 HoursBig data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.
Pentaho Business Intelligence (PBI) - moduły raportowe
28 HoursThe "Pentaho Business Intelligence (PBI) - reporting modules" training allows you to gain knowledge in the field of Business Intelligence, focusing on the reporting modules of the Pentaho platform. Participants will learn to use Report Designer to create reports from basic to advanced levels, including advanced data formatting, using parameters, PDI transformations and JavaScript queries. Additionally, the training covers the use of Business Intelligence Server, scheduling, sharing reports and the basics of creating transformations in Pentaho Data Integration.
Pentaho Open Source BI Suite Community Edition (CE)
28 HoursPentaho Open Source BI Suite Community Edition (CE) is a business intelligence package that provides data integration, reporting, dashboards, and load capabilities.
In this instructor-led, live training, participants will learn how to maximize the features of Pentaho Open Source BI Suite Community Edition (CE).
By the end of this training, participants will be able to:
- Install and configure Pentaho Open Source BI Suite Community Edition (CE)
- Understand the fundamentals of Pentaho CE tools and their features
- Build reports using Pentaho CE
- Integrate third party data into Pentaho CE
- Work with big data and analytics in Pentaho CE
Audience
- Programmers
- BI Developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Pentaho Data Integration Fundamentals
21 HoursPentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.
In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle and maximize the value of data within their organization.
By the end of this training, participants will be able to:
- Create, preview, and run basic data transformations containing steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
- Provide results to third-part applications for further processing
Audience
- Data Analyst
- ETL developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice