Course Duration in Hours
90
90
Topic 1 # Introduction to Hadoop and Big-data Introduction to Big-data Introduction to Hadoop
Business problems / Challenges with Big data Scenarios where Hadoop is used
Overview of batch Processing and real-time data analytics using Hadoop Hadoop vendors - Apache, Cloudera, Hortonworks
Hadoop versions - Hadoop 1.x and Hadoop 2.x Hadoop services - HDFS, MapReduce, YARN
Introduction to Hadoop ecosystem components (Hive, HBase, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark)
Topic 2 # Cluster setup (Hadoop 1.x)
Linux VM installation on system for Hadoop cluster using Oracle Virtual Box Preparing nodes for Hadoop and VM settings
Install Java and configure passwordless SSH across nodes Basic Linux commands
Hadoop 1.x Single node deployment
Hadoop Daemons - NameNode, JobTracker, DataNode, TaskTracker, Secondary NameNode
Hadoop configuration files and running Important Web URls and Logs for Hadoop Run HDFS and Linux commands Hadoop 1.x mutli-mode deployment
Run sample jobs in Hadoop single and multi-node clusters
Topic 3 # HDFS Concepts HDFS Design Goals
Understand Blocks and how to configure block size Block replication and replication factor
Understand Hadoop Rack Awareness and configure racks in Hadoop File read and wrire anatomy in HDFS
Enable HDFS Trash
Configure HDFS Name and Space Quota
Configure and use WebHDFS (REST API for HDFS)
Health monitoring using FSCK command
Understand NameNode Safemode, File system Image and Edits
Configure Secondary NameNode and use checkpointing process to provide NameNode failover
HDFS DFSAdmin and File system shell Commands Hadoop Namenode / Datanode directory Structure HDFS permissions model
HDFS Offline Image Viewer
Topic 4 # MapReduce Concepts Introduction to MapReduce MapReduce Architecture
Understanding the concept of Mappers & Reducers Anatomy of MapReduce Program
Phases of a MapReduce program Data-types in Hadoop MapReduce Driver, Mapper and Reducer classes InputSplit and RecordReader
InputFormat and OutputFormat in Hadoop Concepts of Combiner and Partitioner Running and Monitoring MapReduce jobs
Writing your own MapReduce job using MapReduce API
Topic 5 # Cluster setup (Hadoop 2.x) Hadoop 1.x Limitations Design Goals for HAdoop 2.x Introduction to Hadoop 2.x Introduction to YARN
Components of YARN - ResourceManager, NodeManager, ApplicationMaster Deprecated properties
Hadoop 2.x Single node deployment Hadoop 2.x mutli-mode deployment
Topic 6 # HDFS High Availability and Federation Introduction to HDFS Federation Understand Nameservice ID and block pools
Introduction to HDFS High Availability Failover mechanisms in Hadoop 1.x Concept of Active and StandBy NameNode
Configuring Journal Nodes and avoiding split brain scenario
Automatic and manual fail-over techniques in HA using Zookeeper and ZKFC HDFS HAadmin commands
Topic 7 # YARN - Yet Another Resource Negotiator YARN Architecture
YARN Components - ResourceManager, NodeManager, JobHistoryServer, Application TimelineServer, MRApplicationMaster
YARN Application execution flow Running and Monitoring YARN Applications
Understand and configure Capacity/Fair Schedulers in YARN Define and configure Queues
JobHistory Server / Application Timeline server YARN REST API
Writing and executing YARN applications
Topic 8 # Apache Zookeeper Introduction to Apache Zookeeper Zookeeper stand-alone installation Zookeeper clustered installation
Understand Znode and Ephemeral nodes Manage Znodes using Java API Zookeeper four letter word commands
Topic 9 # Apache Hive Introduction to Hive Hvie Architecture
Components - Metastore, HiveServer2, Beeline, HiveCli, Hive WebInterface Installation and configuration
Metastore service DDLs and DMLs
SQL – Select, Filter, Join, Group By Hive Partitions and buckets in Hive Hive User Defined Funcitons Introduction to HCatalog
Install and configure HCatalog services
Topic 10 # Apache Pig
Introduction to Pig Pig installation Accessing Pig Grunt shell Pig Data Types
Pig commands
Pig Relational Operators Pig User Defined Funcitons Configure Pig to use HCatalog
Topic 11 # Apache Sqoop Introduction to Sqoop
Sqoop Architecture and Installation Import data using Sqoop in HDFS Import all tables in Sqoop Import tables directly in Hive Export data from HDFS
Topic 12 # Apache Flume Introduction to Flume
Flume Architecture and Installation
Define Flume agent - Sink, Source and Channel Flume Use Cases
Topic 13 # Apache Oozie Introduction to Oozie Oozie Architecture
Oozie server installation and configurations
Design Workflows, Coordinator Jobs, Bundle Jobs in Oozie
Topic 14 # Apache HBase Introduction to HBase HBase Architecture
HBase components -- HBase Master and RegionServers HBase installation and configurations
Create sample tables and queries on HBase
Topic 15 # (Overview) Apache Spark / Storm / Kafka
Real-time data Analytics Introduction to Spark / Storm / Kafka
Topic 16 # Cluster Monitoring and Management tools Cloudera Manager
Apache Ambari Ganglia
JMX monitoring and Jconsole Hadoop User Experience (HUE)
BCA, MCA, B-Tech, BSC-IT, PGDCA, Etc.
VisionHook, Sector 15 (Noida),Noida,IN