Big Data Cloudera Platform Certification Path
il Corso "Big Data Cloudera Platform" è un focus sulle soluzioni di Big Data implementabili tramite la soluzione "Cloudera Platform" nella sua recente evoluzione ed implementazione in ambito Cloud Computing
Il corso è stato progettato da First Consulting per una acquisizione di una knowledge sinergica "Combo" dedicata a ruoli di Data Analyst e Data Engineers Cloudera
permettendo la formazione anche di Team Big Data fisiologicamente suddivisi per aree di intervento.
DURATA
min 10 gg piano formativo personalizzabile
ATTESTATI
Attestato
di Frequenza First Consulting
CERTIFICAZIONI
Corso
Propedeutico
CCP
KEY POINT
Cloudera Hadoop
Spark Big Data
Cloud Analytics
Programma
Cloudera Certified Professional (CCP)
​
Data Analyst Exam (CCA159) e Data Engineer Exam (DE575) Combo mode
Audience and Prerequisites
​
Candidates for CCA Data Analyst can be SQL developers, data analysts, business intelligence specialists, developers, system architects, and database administrators.
Data Ingest
-
Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges.
-
Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.
-
Load data into and out of HDFS using the Hadoop File System (FS) commands.
Transform, Stage, Store
-
Convert data from one file format to another
-
Write your data with compression
-
Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library)
-
Change the data format of values in a data set
-
Purge bad records from a data set, e.g., null values
-
Deduplication and merge data
-
Denormalize data from multiple disparate data sets
-
Evolve an Avro or Parquet schema
-
Partition an existing data set according to one or more partition keys
-
Tune data for optimal query performance
Provide Structure to the Data
Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.
-
Create tables using a variety of data types, delimiters, and file formats
-
Create new tables using existing tables to define the schema
-
Improve query performance by creating partitioned tables in the metastore
-
Alter tables to modify existing schema
-
Create views in order to simplify queries
Data Analysis
-
Prepare reports using SELECT commands including unions and subqueries
-
Calculate aggregate statistics, such as sums and averages, during a query
-
Create queries against multiple data sources by using join commands
-
Transform the output format of queries by using built-in functions
-
Perform queries across a group of rows using windowing functions
-
Write a query to aggregate multiple rows of data
-
Write a query to calculate aggregate statistics (e.g., average or sum)
-
Write a query to filter data
-
Write a query that produces ranked or sorted data
-
Write a query that joins multiple data sets
-
Read and/or create a Hive or an HCatalog table from existing data in HDFS
​
Workflow
The ability to create and execute various jobs and actions that move data towards greater value and use in a system.
-
Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc.
-
Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.
-
Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies
VUOI SAPERNE DI PIÙ?
Telefono
Email Segreteria
Email Commerciale