Course Duration in Hours
80
80
Hadoop Developer and Administration Program
The Motivation for Hadoop (1:00 hours)
Problems with traditional large-scale systems
3 Vs in Big Data
About Distributed cluster computing framework
Requirements for a new approach
Real time Use Cases
Hadoop: Basic Concepts (3:00 hours)
An Overview of Hadoop
Hadoop Distributed File System
Hands-On Exercise
How Map Reduce Works
Hands-On Exercise
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components
Practical Hadoop Node Cluster Setup Installation in Pseudo-distributed mode
Using Cloudera VMs
Work Assignments
Deep Dive into HDFS (3:00 hours)
HDFS Design
Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
Rack Awareness
Read/Write from HDFS
HDFS Federation and High Availability (Hadoop 2.x.x)
HDFS Command Line Interface
File Read Cycle from HDFS
File Write Cycle from HDFS
Failure or Error Handling When File Reading Fails
Failure or Error Handling while File write fails
Work Assignments
Map Reduce Components (3:00 hours)
The Map Reduce Flow
Examining a Sample Map Reduce Program
Basic Map Reduce API Concepts
About Drivers
About Mappers
About Reducers
Hands-on exercise
Work Assignments
Map Reduce Program Internals (3:00 hours)
How Map Reduce Works
Anatomy of Map Reduce Job
Submission & Initialization of Map Reduce Job (What Happen?)
Assigning & Execution of Tasks
Monitoring & Progress of Map Reduce Job
Completion of Job
Handling of Map Reduce Job
Task Failure
Task Tracker Failure
Job Tracker Failure
Reducing Intermediate Data With Combiners
Writing Partitioners for Better Load Balancing
Using the Distributed Cache
Operations on Map Reduce Dealing with Joins
Commonly Used Map Reduce Algorithms
Map Reduce Pitfalls and Recovery strategy
Work Assignments
Map Reduce DataTypes and Formats (1:00 hours)
Serialization In Hadoop
Hadoop Writable and Comparable
Hadoop RawComparator and Custom Writable
MapReduce Types and Formats
Understand Difference Between Block and InputSplit
Role of RecordReader
FileInputFormat
ComineFileInputFormat and Processing whole file Single Mapper
Text/KeyValue/NLine InputFormat
BinaryInput processing
MultipleInputs Format DatabaseInput and Output
Text/Binary/Multiple/Lazy OutputFormat MapReduce Types
Advanced Map Reduce Concepts (3:00 hours)
Job Scheduling
In Depth Shuffle and Sorting
Speculative Execution
Configuration and Performance Tuning
Joins in Map Reduce
Map Reduce Joining
Map Reduce Job Chaining
Using Distributed Cache
Work Assignments
Hive A Warehouse for Hadoop Infrastructure (3:00 hours)
What is Hive ?
Architecture of Hive
Hive Services
Hive Clients
How Hive Differs from Traditional RDBMS
Introduction to HiveQL
Data Types and File Formats in Hive
File Encoding
Common problems while working with Hive
Hands-on Assignments
Practical Hive Installation and Setup.
Using Cloudera VMs
Advanced Hive Concepts (3:00 hours)
Hive QL
Managed and External Tables
Understand Storage Formats
Querying Data
Sorting and Aggregation
Map Reduce In Query
Joins, Sub Queries and Views
Writing User Defined Functions (UDFs)
Data types and schemas
Querying Data
Hive ODBC
User-Defined Functions
Apache Pig - Transformation Framework for Hadoop (4:00 hours)
What is Pig ?
Introduction to Pig Data Flow Engine
Pig and Map Reduce in Detail
When should Pig Used ?
Pig and Hadoop Cluster
Pig Interpreter and Map Reduce
Pig Relations and Data Types
Pig Latin Example in Detail
Debugging and Generating Example in Apache Pig
Using Cloudera VMs
Apache Sqoop SQL <---> Hadoop (3:00 hours)
What is Apache Sqoop
Sqoop Architecture
Sqoop JDBC Driver and Connectors
Sqoop Importing Data
Various Options to Import Data
Table Import
Binary Data Import
SpeedUp the Import
Filtering Import
Full Database Import Introduction to Sqoop
Hands-On Sqoop Use Case
Work Assignments
Apache Flume Real time Stream processing (3:00 hours)
What is Apache Flume
Flume Architecture
Sample Twitter Feed Configuration
Flume Channel
Memory Channel
File Channel
Sinks and Sink Processors
Sources
Channel Selectors
Interceptors
Hands-On Flume Use Case
Work Assignments
Practical Installation and Setup
Using Cloudera VMs
No SQL Systems Hbase and Other Distributions (3:00 hours)
What is Hbase?
HBase Architecture
HBase API
Managing large data sets with Hbase
Using HBase in Hadoop applications
Overview Of Cloudera Manager and Hue
Oveview Of Impala
Overview of Oozie
Hands-on exercise
Practical Installation and Setup
Using Cloudera VMs
Optional
Spark Next Generation MapReduce Framework (3:00 hours)
What is Apache Spark
How Spark differs from Hadoop s Map Reduce
Spark Architecture
Spark Internals
RDD
Actions and Transformations
Hands-on Assignments
Important Note
Module Assignments
Individual assignments for each module will be assigned and evaluated
Final Assignments
POC based on a real time use case will be handled by candidates covering all the Hadoop components.
Data Sets will be included for all the assignments.
Key Notes
Support and assistance will be provided if candidates work either on Cloudera VMs or stand alone installation or any other open source softwares.
Complete guidance and support to all candidates (online/offline) 24*7.
Hadoop Interview based questions will be shared to the candidates once the course is completed. WhatsApp group will be created and interview questions, solutions and other updates will be shared and posted regularly among the candidates.
Training portions will be covered also based on Hadoop Certification Programs like Cloudera/HortonWorks/EMC
Any UG or PG
Tecfactory, Indira Nagar (Bangalore),Bangalore,IN