Course Outline
Section 1: Data Management in HDFS
- Various Data Formats (JSON / Avro / Parquet)
- Compression Schemes
- Data Masking
- Labs : Analyzing different data formats; enabling compression
Section 2: Advanced Pig
- User-defined Functions
- Introduction to Pig Libraries (ElephantBird / Data-Fu)
- Loading Complex Structured Data using Pig
- Pig Tuning
- Labs : advanced pig scripting, parsing complex data types
Section 3 : Advanced Hive
- User-defined Functions
- Compressed Tables
- Hive Performance Tuning
- Labs : creating compressed tables, evaluating table formats and configuration
Section 4 : Advanced HBase
- Advanced Schema Modelling
- Compression
- Bulk Data Ingest
- Wide-table / Tall-table comparison
- HBase and Pig
- HBase and Hive
- HBase Performance Tuning
- Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
Requirements
- comfortable with Java programming language (most programming exercises are in java)
- comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)
- a working knowledge of Hadoop.
Lab environment
Zero Install: There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser
Testimonials (5)
Trainer's preparation & organization, and quality of materials provided on github.
Mateusz Rek - MicroStrategy Poland Sp. z o.o.
Course - Impala for Business Intelligence
Project for independent preparation, an interesting example of a DevOps-node Hadoop cluster with Ambari, trainer support (logging into a virtual machine, good and direct communication)
Bartlomiej Krasinski - Rossmann SDP
Course - HBase for Developers
Machine Translated
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Course - Hadoop Administration on MapR
Intercollegial communication with training participants.
Andrzej Szewczuk - Izba Administracji Skarbowej w Lublinie
Course - Apache NiFi for Administrators
Machine Translated
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.