Loading Classes

Select A Vendor / Topic▼
(ISC)²
Acronis
Apple
Avaya
AWS
BMC
Brocade
Business Analysis
Cisco
Citrix
Cloud Computing
Cloudera
CompTIA
Dell SonicWALL
FlexPod
ForgeRock
Google
HPE
IBM
Juniper
Microsoft
NetApp
Nutanix
Palo Alto Networks
Pivotal - Spring
Red Hat
Riverbed
Salesforce
SAP
Symantec
Veeam
Veritas
VMware

Search

Choose Cloudera Path ▼
Cloudera: Administrator Training
Cloudera: Data Analyst Training
Cloudera: Developer Training
Cloudera: Search Training

Choose Cloudera Certification ▼
Cloudera Certified Administrator for Apache Hadoop (CCAH)
Cloudera Certified Developer for Apache Hadoop (CCDH)
Cloudera Certified Professional: Data Engineer (CCP:Data Engineer)
Cloudera Certified Professional: Data Scientist (CCP:DS)
Cloudera Certified Specialist in Apache HBase (CCSHB)

Choose Cloudera: Data Analyst Training Path ▼
Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop
Cloudera Data Science at Scale using Spark and Hadoop

Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH)

Cloudera
Certifications: Cloudera Certified Professional: Data Scientist (CCP:DS)

New Age Technologies has been delivering Authorized Training since 1996. We offer Cloudera’s full suite of authorized courses including courses pertaining to Spark, Apache Hadoop, HBase, MapReduce, Data Science, Cloudera Data Analyst and more. If you have any questions or can’t seem to find the Cloudera class that you are interested in, contact one of our Cloudera Training Specialists. Invest in your future today with Cloudera training from New Age Technologies.

✉ Cloudera Training Specialists | ☏ 502.909.0819

ENTER CODE "CLOUDERA10" @ CHECKOUT & RECEIVE 10% OFF OR REQUEST GIFT CARD EQUIVALENT

Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH) Overview:

The Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH) hands-on course is for anyone who wants to access, manipulate, transform, and analyze massive data sets in the Hadoop cluster using SQL and familiar scripting languages. It focuses on Apache Pig and Hive and Cloudera Impala which will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Cloudera’s Hadoop Ecosystem -> Hive, Pig, & Impala:

Apache Hive makes multi-structured data accessible to analysts, database administrators, and others without Java programming expertise
Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster
Cloudera Impala enables real-time interactive analysis of the data stored in Hadoop via a native SQL environment

Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH) Prerequisites:

Before attending this course, you must have the following:

Data analysts, business intelligence specialists, developers, system architects, and database administrators
Knowledge of SQL is assumed, as is basic Linux command-line familiarity
Knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby) would be helpful but is not essential
Prior knowledge of Apache Hadoop is not required

Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH) Objectives:

After successfully completing this course, you will be able to:

Understand the features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
Use the fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop tools
Use Pig, Hive, and Impala to improve productivity for typical analysis tasks
Join diverse datasets to gain valuable business insight
Perform real-time, complex queries on datasets

Cloudera Data Analyst Training: Using Pig, Hive and Impala with Hadoop (CDAPHIH) Outline:

Module 1: Hadoop Fundamentals

The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing: YARN, MapReduce, and Spark
Data Processing and Analysis: Pig, Hive, and Impala
Data Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios Explanation

Module 2: Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

Module 3: Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions

Module 4: Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data

Module 5: Multi-Dataset Operations with Pig

Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets

Module 6: Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs

Module 7: Introduction to Hive and Impala

What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases

Module 8: Querying with Hive and Impala

Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Differences Between Hive and Impala Query Syntax
Using Hue to Execute Queries
Using the Impala Shell

Module 9: Data Management

Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results

Module 10: Data Storage and Performance

Partitioning Tables
Choosing a File Format
Managing Metadata
Controlling Access to Data

Module 11: Relational Data Analysis with Hive and Impala

Joining Datasets
Common Built-In Functions
Aggregation and Windowing

Module 12: Working with Impala

How Impala Executes Queries
Extending Impala with User-Defined Functions
Improving Impala Performance

Module 13: Analyzing Text and Complex Data with Hive

Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Conclusion

Module 14: Hive Optimization

Understanding Query Performance
Controlling Job Execution Plan
Bucketing
Indexing Data

Module 15: Extending Hive

SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries

Module 16: Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
Which to Choose?

Average Salary for Skill: Hadoop

Median Salary by Job – Skill: Hadoop (United States)

Choose Class Delivery Option
- All Classes
- Online Live
- Classroom
  - Select A Location ▼
  - Atlanta, GA
  - Boston, MA
  - Chicago, IL
  - Dallas, TX
  - Edison, NJ
  - Los Angeles, CA
  - Philadelphia, PA
  - Phoenix, AZ
  - Sacramento, CA
  - San Francisco, CA
  - San Jose, CA
- Self-Paced
- Guaranteed To Run
Class Price And Schedules
$2,995.00
- 05/24/2016 - 05/27/2016
  09:00 AM - 05:00 PM (EST)
  Online LiveRegister
- 05/31/2016 - 06/03/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  HD TelepresenceRegister
- 05/31/2016 - 06/03/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 05/31/2016 - 06/03/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  Instructor OnsiteRegister
- 06/21/2016 - 06/24/2016
  08:00 AM - 04:00 PM (CST)
  Dallas, TX - LBJ Freeway
  HD TelepresenceRegister
- 06/21/2016 - 06/24/2016
  09:00 AM - 05:00 PM (EST)
  Atlanta, GA - Abernathy Rd
  Instructor OnsiteRegister
- 06/21/2016 - 06/24/2016
  09:00 AM - 05:00 PM (EST)
  Online LiveRegister
- 06/21/2016 - 06/24/2016
  09:00 AM - 05:00 PM (EST)
  King of Prussia, PA - First Avenue
  HD TelepresenceRegister
- 06/21/2016 - 06/24/2016
  09:00 AM - 05:00 PM (EST)
  Burlington, MA - Burlington Mall Rd
  HD TelepresenceRegister
- 06/21/2016 - 06/24/2016
  09:00 AM - 05:00 PM (EST)
  Edison, NJ - Fieldcrest Avenue
  HD TelepresenceRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (PST)
  San Jose, CA - W. St. John Street
  HD TelepresenceRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (PST)
  El Segundo, CA - N. Sepulveda Blvd
  HD TelepresenceRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  HD TelepresenceRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (PST)
  Sacramento, CA - Cal Center Drive
  HD TelepresenceRegister
- 06/28/2016 - 07/01/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  Instructor OnsiteRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  HD TelepresenceRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  Instructor OnsiteRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (PST)
  Sacramento, CA - Cal Center Drive
  HD TelepresenceRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (PST)
  El Segundo, CA - N. Sepulveda Blvd
  HD TelepresenceRegister
- 07/12/2016 - 07/15/2016
  09:00 AM - 05:00 PM (PST)
  San Jose, CA - W. St. John Street
  HD TelepresenceRegister
- 07/26/2016 - 07/29/2016
  09:00 AM - 05:00 PM (EST)
  Online LiveRegister
- 07/26/2016 - 07/29/2016
  09:00 AM - 05:00 PM (EST)
  Burlington, MA - Burlington Mall Rd
  HD TelepresenceRegister
- 07/26/2016 - 07/29/2016
  08:00 AM - 04:00 PM (CST)
  Dallas, TX - LBJ Freeway
  HD TelepresenceRegister
- 07/26/2016 - 07/29/2016
  08:00 AM - 04:00 PM (CST)
  Chicago, IL - W. Monroe
  Instructor OnsiteRegister
- 07/26/2016 - 07/29/2016
  09:00 AM - 05:00 PM (EST)
  King of Prussia, PA - First Avenue
  HD TelepresenceRegister
- 07/26/2016 - 07/29/2016
  09:00 AM - 05:00 PM (EST)
  Atlanta, GA - Abernathy Rd
  HD TelepresenceRegister
- 07/26/2016 - 07/29/2016
  09:00 AM - 05:00 PM (EST)
  Edison, NJ - Fieldcrest Avenue
  HD TelepresenceRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  Instructor OnsiteRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  HD TelepresenceRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (PST)
  El Segundo, CA - N. Sepulveda Blvd
  HD TelepresenceRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (PST)
  Sacramento, CA - Cal Center Drive
  HD TelepresenceRegister
- 08/02/2016 - 08/05/2016
  09:00 AM - 05:00 PM (PST)
  San Jose, CA - W. St. John Street
  HD TelepresenceRegister
- 08/09/2016 - 08/12/2016
  08:00 AM - 04:00 PM (CST)
  Dallas, TX - LBJ Freeway
  HD TelepresenceRegister
- 08/09/2016 - 08/12/2016
  09:00 AM - 05:00 PM (EST)
  Online LiveRegister
- 08/09/2016 - 08/12/2016
  09:00 AM - 05:00 PM (EST)
  Atlanta, GA - Abernathy Rd
  Instructor OnsiteRegister
- 08/09/2016 - 08/12/2016
  09:00 AM - 05:00 PM (EST)
  Burlington, MA - Burlington Mall Rd
  HD TelepresenceRegister
- 08/09/2016 - 08/12/2016
  09:00 AM - 05:00 PM (EST)
  Edison, NJ - Fieldcrest Avenue
  HD TelepresenceRegister
- 08/09/2016 - 08/12/2016
  09:00 AM - 05:00 PM (EST)
  King of Prussia, PA - First Avenue
  HD TelepresenceRegister
- 08/23/2016 - 08/26/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 08/23/2016 - 08/26/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  HD TelepresenceRegister
- 08/23/2016 - 08/26/2016
  09:00 AM - 05:00 PM (PST)
  San Jose, CA - W. St. John Street
  HD TelepresenceRegister
- 08/23/2016 - 08/26/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  Instructor OnsiteRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (PST)
  Sacramento, CA - Cal Center Drive
  HD TelepresenceRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (PST)
  San Jose, CA - W. St. John Street
  HD TelepresenceRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (PST)
  San Francisco, CA - Sansome
  HD TelepresenceRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (PST)
  Online LiveRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (MST)
  Phoenix, AZ - North First Ave
  Instructor OnsiteRegister
- 09/13/2016 - 09/16/2016
  09:00 AM - 05:00 PM (PST)
  El Segundo, CA - N. Sepulveda Blvd
  HD TelepresenceRegister
- 09/20/2016 - 09/23/2016
  09:00 AM - 05:00 PM (EST)
  Edison, NJ - Fieldcrest Avenue
  HD TelepresenceRegister
- 09/20/2016 - 09/23/2016
  09:00 AM - 05:00 PM (EST)
  King of Prussia, PA - First Avenue
  HD TelepresenceRegister
- 09/20/2016 - 09/23/2016
  09:00 AM - 05:00 PM (EST)
  Burlington, MA - Burlington Mall Rd
  HD TelepresenceRegister
- 09/20/2016 - 09/23/2016
  09:00 AM - 05:00 PM (EST)
  Online LiveRegister
- 09/20/2016 - 09/23/2016
  08:00 AM - 04:00 PM (CST)
  Dallas, TX - LBJ Freeway
  HD TelepresenceRegister
- 09/20/2016 - 09/23/2016
  09:00 AM - 05:00 PM (EST)
  Atlanta, GA - Abernathy Rd
  Instructor OnsiteRegister
+ Show More Classes

shopping bag