FirstConsulting: Corso Big Data Cloudera

BIG DATA CLOUDERA

Cloudera è leader nel software e nei servizi basati su Apache Hadoop e offre una potente nuova piattaforma di dati che consente alle aziende e alle organizzazioni di esaminare tutti i loro dati, strutturati e non strutturati, e porre domande più grandi per una visione senza precedenti alla velocità del pensiero. Questo corso è adatto agli amministratori di sistemi e ai responsabili IT che hanno esperienza di base su Linux. La conoscenza pregressa di Apache Hadoop non è richiesta.

Durata

40 ore

Programma

Hadoop Fundamentals

The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing: YARN, MapReduce, and Spark
Data Processing and Analysis: Pig, Hive, and Impala
Database Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios

Introduction to Pig

What is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly Used Functions

Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data

Multi-Dataset Operations with Pig

Techniques for Combining Datasets
Joining Datasets in Pig
Set Operations
Splitting Datasets

Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Pig Jobs

Introduction to Hive and Impala

What is Hive?
What is Impala?
Why Use Hive and Impala?
Schema and Data Storage
Comparing Hive and Impala to Traditional Databases
Use Cases

Querying with Hive and Impala

Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Using Hue to Execute Queries
Using Beeline (Hive’s Shell)
Using the Impala Shell

Hive and Impala Data Management

Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results
Data Storage and Performance

Relational Data Analysis with Hive and Impala

Joining Datasets
Common Built-In Functions • Aggregation and Windowing

Complex Data with Hive and Impala

Complex Data with Hive
Complex Data with Impala
Analyzing Text with Hive and Impala
Using Regular Expressions with Hive and Impala
Processing Text Data with SerDes in Hive
Sentiment Analysis and n-grams

Hive Optimization

Understanding Query Performance • Bucketing
Indexing Data
Hive on Spark

Impala Optimization

How Impala Executes Queries
Improving Impala Performance

Extending Hive and Impala

Custom SerDes and File Formats in Hive
Data Transformation with Custom Scripts in Hive
User-Defined Functions
Parameterized Queries

Choosing the Best Tool for the Job

Comparing Pig, Hive, Impala, and Relational Databases

Which to Choose?

Obiettivi

I partecipanti al termine del corso avranno appreso:

Funzionalità di Cloudera Manager che semplificano la gestione dei cluster, ad esempio registrazione aggregata, gestione della configurazione, gestione delle risorse, rapporti, avvisi e gestione dei servizi.
I componenti interni di YARN, MapReduce, Spark e HDFS
Determinazione dell'hardware e dell'infrastruttura corretti per il cluster
Configurazione e implementazione appropriate del cluster da integrare con il data center
Come caricare i dati nel cluster da file generati dinamicamente usando Flume e da RDBMS usando Sqoop
Configurazione di FairScheduler per fornire contratti a livello di servizio per più utenti di un cluster
Best practice per preparare e mantenere Apache Hadoop in produzione
Risoluzione dei problemi, diagnosi, messa a punto e risoluzione dei problemi di Hadoop.

Attestati di frequenza

Al termine del corso ad ogni partecipante verrà rilasciato un attestato di frequenza.

TORNA ALLA LISTA DEI CORSI DI "INFORMATICA"

Vuoi maggiori informazioni? Contattaci!

Saremo lieti di assisterti.

Contattaci

Vuoi maggiori informazioni? Contattaci!

Informativa sulla privacy

Segnalazioni 231

Amministrazione Trasparente

MENÙ