Course Duration in Hours
90
90
HADOOP COURSE CONTENT
Virtual box/VM Ware
a. Basics
b. Installations
c. Backups
d. Snapshots
Linux
a. Basics
b. Installations
c. Commandes
Hadoop
a. Why Hadoop?
b. Scaling
c. Distributed Framework
d. Hadoop v/s RDBMS
e. Brief history of Hadoop
Setup Hadoop
a. Pseudo mode
b. Cluster mode
c. Ipv6
d. Ssh
e. Installation of java, Hadoop
f. Configurations of Hadoop
g. Hadoop Processes ( NN, SNN, JT, DN, TT)
h. Temporary directory
i. UI
j. Common errors when running Hadoop cluster, solutions
HDFS- Hadoop distributed File System
a. HDFS Design and Architecture
b. HDFS Concepts
c. Interacting HDFS using command line
d. Interacting HDFS using Java APIs
e. Dataflow
f. Blocks
g. Replica
Hadoop Processes
a. Name node
b. Secondary name node
c. Job tracker
d. Task tracker
e. Data node
Map Reduce
a. Developing Map Reduce Application
b. Phases in Map Reduce Framework
c. Map Reduce Input and Output Formats
d. Advanced Concepts
e. Sample Applications
f. Combiner
g. HAR
Joining datasets in Map reduce jobs
a. Map-side join
b. Reduce-Side join
Map reduce customization
a. Custom Input format class
b. Hash Practitioner
c. Custom Practitioner
d. Sorting techniques
e. Custom Output format class
Hadoop Programming Languages:-
PIG
a. Introduction
b. Installation and Configuration
c. Interacting HDFS using PIG
d. Map Reduce Programs through PIG
e. PIG Commands
f. Loading, Filtering, Grouping.
g. Data types, Operators..
h. Joins, Groups.
i. Sample programs in PIG
Hive
a. Basics
b. Installation and Configurations
c. Commandes.
NOSQL Databases Concepts
Specialties:
ETL tool (PDI ) ( Data Warehousing BI Tools)
a. Introduction
b. Creating RDBMS database
c. Establishing Connection between PDI to RDMS database
d. Creating data in hadoop
e. Establishing Connection between PDI to Hadoop data
f. Summarization
OVERVIEW HADOOP DEVELOPER
Introduction
The Motivation for Hadoop
Problems with traditional large-scale systems
Requirements for a new approach
Hadoop: Basic Concepts
An Overview of Hadoop
The Hadoop Distributed File System
Hands-On Exercise
How MapReduce Works
Hands-On Exercise
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components
Writing a Map Reduce Program
The Map Reduce Flow
Examining a Sample Map Reduce Program
Basic Map Reduce API Concepts
The Driver Code
The Mapper
The Reducer
Hadoops Streaming API
Using Eclipse for Rapid Development
Hands-on exercise
The New MapReduce API
Delving Deeper Into The Hadoop API
More about Tool Runner
Testing with MRUnit
Reducing Intermediate Data With Combiners
The configure and close methods for Map/Reduce Setup and Teardown
Writing Partitioners for Better Load Balancing
Hands-On Exercise
Directly Accessing HDFS
Using the Distributed Cache
Hands-On Exercise.
Common Map Reduce Algorithms
Sorting and Searching
Indexing
Machine Learning With Mahout
Term Frequency Inverse Document Frequency
Word Co-Occurrence
Hands-On Exercise.
Usining HBase:
What is HBase?
HBase Architecture
HBase API
Managing large data sets with HBase
Using HBase in Hadoop applications
Hands-on exercise.
Using Hive and Pig
Hive Basics
Pig Basics
Hands-on exercise.
Practical Development Tips and Techniques
Debugging MapReduce Code
Using LocalJobRunner Mode For Easier Debugging
Retrieving Job Information with Counters
Logging
Split table File Formats
Determining the Optimal Number of Reducers
Map-Only MapReduce Jobs
Hands-On Exercise.
More Advanced MapReduce Programming
Custom Writable and WritableComparable
Saving Binary Data using SequenceFiles and Avro Files
Creating InputFormats and OutputFormats
Hands-On Exercise
Joining Data Sets in MapReduce
Map-Side Joins
The Secondary Sort
Reduce-Side Joins
Hadoop Ecosystem Overview
Oozie
HBase
Pig
Sqoop
Casandra
Chukwa
Mahout
Zoo Keeper
Flume
Case Studies Discussions
Certification Guidance
Real Time Certification and
interview Questions and Answers
Resume Preparation
Providing all Materials nd Links
Real time Project Explanation and Practice
The students who are interested to learn this course
IBM IT SOLUTIONS, BTM IInd Stage (Bangalore),Bangalore,IN