Course Duration in Hours
60
60
Introduction to Hadoop and its Architecture
Limitations of traditional large scale systems
Compare Hadoop with traditional systems
Understanding Hadoop Architecture
Hadoop Daemons NameNode, DataNode, JobTracker, TaskTracker
2
Setting up Hadoop Single Node and Multi-Node Cluster using Oracle Virtual Box
Linux VM installation on Windows / Mac / Linux for Hadoop cluster using Oracle Virtual Box
Preparing nodes for Hadoop and VM settings (Java, Passwordless SSH, network settings etc.)
Basic Linux commands
Hadoop Deployment Single Node
Hadoop configuration files and running Hadoop services
Important Web URLs and Logs for Hadoop
Run HDFS and Linux commands
Hadoop Deployment Clustered Mode
3
Understanding Hadoop Distributed File System
Design Goals
Blocks, FS Image and Edit Logs
Rack-Awareness in Hadoop
Replica Placement and Selection Policies
Hadoop File System Shell Commands
Safe Mode in HDFS
Hadoop DFSAdmin Commands
File Read / Write Anatomy in HDFS
Hadoop NameNode and DataNodes Directory Structure
Name and Space Quota in HDFS
HDFS Trash Concept
4
Understanding Hadoop DFS 2.x Concepts
HDFS High Availability
Configuring HDFS HA with two NameNodes
Automatic and Manual Fail-over techniques in HA
5
MapReduce Programming Framework PART 1
MapReduce Architecture
Understand the concept of Mappers, Reducers
Anatomy of MapReduce Program and its phases
MapReduce Components Mapper Class, Reducer Class
Splits, Blocks and Record Readers
Understand the concept and need of Combiner and Partitioner
Running and Monitoring MapReduce Jobs
6
MapReduce Programming Framework PART 2
MapReduce Internals
Understanding Input and Output Formats in Hadoop
MapReduce API
Hadoop Data Types
Writing your own MapReduce job
7
YARN Concepts MRv2
Hadoop 1.x Limitations
Design Goals for YARN
YARN Architecture
Components Resource Manager / Node Manager / Application Master
Classic vs. YARN
Application Execution Flow
Life-Cycle Management
Schedulers and Queues
Running and Monitoring YARN applications
Job History Server and Web Application Proxy
8
Apache Hive
What is Hive?
Hive Architecture & Components
Hive Installation
Hive Metastore
Hive Data Model and Data Units
Hive DDL Create/Show/Drop Database
Hive DDL Create/Show/Drop Tables
Hive DML Load Files into Tables
Hive DML Inserting Data into Tables
Hive SQL Select, Filter, Join, Group By
Multi-Table Inserts and Joins
Introduction to SerDe, UDF and UDAF
9
Apache Pig
PIG Installation
PIG Data types
PIG Architecture
PIG Latin
PIG Relational Operators
PIG Functions
PIG UDFs
10
Apache Zookeeper
What is Zookeeper
Installation Standalone/Clustered mode
Zookeeper Command Line, ZNode and Watches
HDFS HA automatic failover using Zookeeper
11
Apache Sqoop
Sqoop Architecture and Installation
Import/Export Data using Sqoop
12
Apache Flume
Flume Architecture and Installation
Flume Use Cases
ProTechSkills, Sector 3 (Noida),Noida,IN