Big Data Cloudera Platform Certification Path

il Corso "Big Data Cloudera Platform" è un focus sulle soluzioni di Big Data implementabili tramite la soluzione "Cloudera Platform" nella sua recente evoluzione ed implementazione in ambito Cloud Computing
Il corso è stato progettato da First Consulting per una acquisizione di una knowledge sinergica "Combo" dedicata a ruoli di Data Analyst e Data Engineers Cloudera
permettendo la formazione anche di Team Big Data fisiologicamente suddivisi per aree di intervento.

DURATA

min 10 gg piano formativo personalizzabile

ATTESTATI

Attestato
di Frequenza First Consulting

CERTIFICAZIONI

Corso
Propedeutico
CCP

KEY POINT

Cloudera Hadoop
Spark Big Data
Cloud Analytics

Programma

Cloudera Certified Professional (CCP)

Data Analyst Exam (CCA159) e Data Engineer Exam (DE575) Combo mode

Audience and Prerequisites

Candidates for CCA Data Analyst can be SQL developers, data analysts, business intelligence specialists, developers, system architects, and database administrators.

Data Ingest

Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges.
Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.
Load data into and out of HDFS using the Hadoop File System (FS) commands.

Transform, Stage, Store

Convert data from one file format to another
Write your data with compression
Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library)
Change the data format of values in a data set
Purge bad records from a data set, e.g., null values
Deduplication and merge data
Denormalize data from multiple disparate data sets
Evolve an Avro or Parquet schema
Partition an existing data set according to one or more partition keys
Tune data for optimal query performance

Provide Structure to the Data

Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

Create tables using a variety of data types, delimiters, and file formats
Create new tables using existing tables to define the schema
Improve query performance by creating partitioned tables in the metastore
Alter tables to modify existing schema
Create views in order to simplify queries

Data Analysis

Prepare reports using SELECT commands including unions and subqueries
Calculate aggregate statistics, such as sums and averages, during a query
Create queries against multiple data sources by using join commands
Transform the output format of queries by using built-in functions
Perform queries across a group of rows using windowing functions
Write a query to aggregate multiple rows of data
Write a query to calculate aggregate statistics (e.g., average or sum)
Write a query to filter data
Write a query that produces ranked or sorted data
Write a query that joins multiple data sets
Read and/or create a Hive or an HCatalog table from existing data in HDFS

Workflow

The ability to create and execute various jobs and actions that move data towards greater value and use in a system.

Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.
Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.
Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies