Course Duration in Hours
36
36
Course Contents
The course covers the following topics:
Introduction
What is Cloud Computing
What is Grid Computing
What is Virtualization
How above three are inter-related to each other
Hadoop Solutions - Big Picture
Amazon,NetApp , Dell, EMC, IBM, Oracle and Etc
Data Intensive Technologies for Cloud Computing
Introduction
Charactaristics
Architectures
Comparing Hadoop Vs HPCC
Conclusion
Comparision with other Systems
RDBMS
Grid Computing
Volunteer Computing
Data Retrieval - Radom Access Vs. Sequential Access
No SQL Databases
The Motivation For Hadoop
Problems with traditional large-scale systems
Requirements for a new approach
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
How MapReduce Works
Anatomy of a Hadoop Cluster
Joining Data Sets in MapReduce Jobs
Map-Side Joins
Reduce-Side Joins
Programming Practices & Performance Tuning
Developing MapReduce Programs
> Local Mode
> Pseudo-distributed Mode
Monitoring and debugging on a Production Cluster
> Counters
> Skipping Bad Records
> Rerunning failed tasks with Isolation Runner
Tuning for Performance
> Reducing network traffic with combiner
> Reducing the amount of input data
> Using Compression
> Reusing the JVM
> Running with speculative execution
Refactoring code and rewriting algorithmsParameters affecting Performance
Other Performance Aspects
Hadoop with Analytics using R
Introduction to Big Data analytics
Use of statistics over big data using R.
Introduction over R.
Using R, How to create API which will interact hadoop Ecosystem compoment.
Integration of Java,R,Hadoop,Hive etc.
Graph Manipulation in Hadoop
Introduction to graph techniques
Representing Graphs in Hadoop
Implementing a sample algorithm: Single Source Shortest Path
Writing a MapReduce Program
Examining a Sample MapReduce Program
Basic API Concepts
The Driver Code
Anatomy of File Read and Write
Basic Record Reader Anatomy
Input and Ouput Format class
The Mapper
The Reducer
Hadoop s Streaming API
Integrating Hadoop Into The Workflow
Relational Database Management Systems
Storage Systems
Importing Data from RDBMSs With Sqoop
Importing Real-Time Data with Flume
Delving Deeper Into The Hadoop API
Using Combiners
The configure and close Methods
SequenceFiles
Partitioners
Custom RecordReader
Custom Input and Output Class
Counters
Directly Accessing HDFS
ToolRunner
Using The Distributed Cache
Common MapReduce Algorithms
Sorting and Searching
Indexing
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Using Hive and Pig
Hive Basics
Pig Basics
Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies
Advanced MapReduce Programming
A Recap of the MapReduce Flow
Custom Writables and WritableComparables
The Secondary Sort
Creating InputFormats andOutputFormats
Pipelining Jobs With Oozie
ANY
Three Spear, BTM Ist Stage (Bangalore),Bangalore,IN