top of page
Header4_BigData_1100x300.jpg

Big Data Cloudera Platform Certification Path

il Corso "Big Data Cloudera Platform" è un focus sulle soluzioni di Big Data implementabili tramite la soluzione "Cloudera Platform" nella sua recente evoluzione ed implementazione in ambito Cloud Computing
Il corso è stato progettato da First Consulting per una acquisizione di una knowledge sinergica "Combo" dedicata a ruoli di Data Analyst e Data Engineers Cloudera
permettendo la formazione anche di Team Big Data fisiologicamente suddivisi per aree di intervento. 

DURATA

min 10 gg piano formativo personalizzabile

ATTESTATI

Attestato
di Frequenza First Consulting 

CERTIFICAZIONI

Corso
Propedeutico
CCP

KEY POINT

Cloudera Hadoop 
Spark Big Data 
Cloud Analytics 

Programma

Cloudera Certified Professional (CCP)

Data Analyst Exam (CCA159) e Data Engineer Exam (DE575) Combo mode

Audience and Prerequisites

Candidates for CCA Data Analyst can be SQL developers, data analysts, business intelligence specialists, developers, system architects, and database administrators.

 

Data Ingest

  • Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges.

  • Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.

  • Load data into and out of HDFS using the Hadoop File System (FS) commands.
     

Transform, Stage, Store

  • Convert data from one file format to another

  • Write your data with compression

  • Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library)

  • Change the data format of values in a data set

  • Purge bad records from a data set, e.g., null values

  • Deduplication and merge data

  • Denormalize data from multiple disparate data sets

  • Evolve an Avro or Parquet schema

  • Partition an existing data set according to one or more partition keys

  • Tune data for optimal query performance
     

Provide Structure to the Data

Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

  • Create tables using a variety of data types, delimiters, and file formats

  • Create new tables using existing tables to define the schema

  • Improve query performance by creating partitioned tables in the metastore

  • Alter tables to modify existing schema

  • Create views in order to simplify queries
     

Data Analysis

  • Prepare reports using SELECT commands including unions and subqueries

  • Calculate aggregate statistics, such as sums and averages, during a query

  • Create queries against multiple data sources by using join commands

  • Transform the output format of queries by using built-in functions

  • Perform queries across a group of rows using windowing functions

  • Write a query to aggregate multiple rows of data

  • Write a query to calculate aggregate statistics (e.g., average or sum)

  • Write a query to filter data

  • Write a query that produces ranked or sorted data

  • Write a query that joins multiple data sets

  • Read and/or create a Hive or an HCatalog table from existing data in HDFS

Workflow

The ability to create and execute various jobs and actions that move data towards greater value and use in a system.

  • Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.

  • Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.

  • Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies

first-consulting_services-it-sales-force_1920x1080.jpg

VUOI SAPERNE DI PIÙ?

Telefono

Email Segreteria 

Email Commerciale

bottom of page