Apache Hadoop Online Intermediate Level by Tecfactory

0 reviews

Start Date

May 4, 2025 - June 26, 2025
Duration

80
Chat

Short List
Share
Report
prev
next

Course Category

Apache Hadoop

Request Call Back

Course Duration in Hours

Course Details

Hadoop Developer and Administration Program

The Motivation for Hadoop (1:00 hours)

Problems with traditional large-scale systems
3 Vs in Big Data
About Distributed cluster computing framework
Requirements for a new approach
Real time Use Cases

Hadoop: Basic Concepts (3:00 hours)

An Overview of Hadoop
Hadoop Distributed File System
Hands-On Exercise
How Map Reduce Works
Hands-On Exercise
Anatomy of a Hadoop Cluster
Other Hadoop Ecosystem Components
Practical Hadoop Node Cluster Setup Installation in Pseudo-distributed mode
Using Cloudera VMs
Work Assignments

Deep Dive into HDFS (3:00 hours)

HDFS Design
Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
Rack Awareness
Read/Write from HDFS
HDFS Federation and High Availability (Hadoop 2.x.x)
HDFS Command Line Interface
File Read Cycle from HDFS
File Write Cycle from HDFS
Failure or Error Handling When File Reading Fails
Failure or Error Handling while File write fails
Work Assignments

Map Reduce Components (3:00 hours)

The Map Reduce Flow
Examining a Sample Map Reduce Program
Basic Map Reduce API Concepts
About Drivers
About Mappers
About Reducers
Hands-on exercise
Work Assignments

Map Reduce Program Internals (3:00 hours)

How Map Reduce Works
Anatomy of Map Reduce Job
Submission & Initialization of Map Reduce Job (What Happen?)
Assigning & Execution of Tasks
Monitoring & Progress of Map Reduce Job
Completion of Job
Handling of Map Reduce Job
Task Failure
Task Tracker Failure
Job Tracker Failure
Reducing Intermediate Data With Combiners
Writing Partitioners for Better Load Balancing
Using the Distributed Cache
Operations on Map Reduce Dealing with Joins
Commonly Used Map Reduce Algorithms
Map Reduce Pitfalls and Recovery strategy
Work Assignments

Map Reduce DataTypes and Formats (1:00 hours)

Serialization In Hadoop
Hadoop Writable and Comparable
Hadoop RawComparator and Custom Writable
MapReduce Types and Formats
Understand Difference Between Block and InputSplit
Role of RecordReader
FileInputFormat
ComineFileInputFormat and Processing whole file Single Mapper
Text/KeyValue/NLine InputFormat
BinaryInput processing
MultipleInputs Format DatabaseInput and Output
Text/Binary/Multiple/Lazy OutputFormat MapReduce Types

Advanced Map Reduce Concepts (3:00 hours)

Job Scheduling
In Depth Shuffle and Sorting
Speculative Execution
Configuration and Performance Tuning
Joins in Map Reduce
Map Reduce Joining
Map Reduce Job Chaining
Using Distributed Cache
Work Assignments

Hive A Warehouse for Hadoop Infrastructure (3:00 hours)

What is Hive ?
Architecture of Hive
Hive Services
Hive Clients
How Hive Differs from Traditional RDBMS
Introduction to HiveQL
Data Types and File Formats in Hive
File Encoding
Common problems while working with Hive
Hands-on Assignments
Practical Hive Installation and Setup.
Using Cloudera VMs

Advanced Hive Concepts (3:00 hours)

Hive QL
Managed and External Tables
Understand Storage Formats
Querying Data
Sorting and Aggregation
Map Reduce In Query
Joins, Sub Queries and Views
Writing User Defined Functions (UDFs)
Data types and schemas
Querying Data
Hive ODBC
User-Defined Functions

Apache Pig - Transformation Framework for Hadoop (4:00 hours)

What is Pig ?
Introduction to Pig Data Flow Engine
Pig and Map Reduce in Detail
When should Pig Used ?
Pig and Hadoop Cluster
Pig Interpreter and Map Reduce
Pig Relations and Data Types
Pig Latin Example in Detail
Debugging and Generating Example in Apache Pig
Using Cloudera VMs

Apache Sqoop SQL <---> Hadoop (3:00 hours)

What is Apache Sqoop
Sqoop Architecture
Sqoop JDBC Driver and Connectors
Sqoop Importing Data
Various Options to Import Data
Table Import
Binary Data Import
SpeedUp the Import
Filtering Import
Full Database Import Introduction to Sqoop
Hands-On Sqoop Use Case
Work Assignments

Apache Flume Real time Stream processing (3:00 hours)

What is Apache Flume
Flume Architecture
Sample Twitter Feed Configuration
Flume Channel
Memory Channel
File Channel
Sinks and Sink Processors
Sources
Channel Selectors
Interceptors
Hands-On Flume Use Case
Work Assignments
Practical Installation and Setup
Using Cloudera VMs

No SQL Systems Hbase and Other Distributions (3:00 hours)

What is Hbase?
HBase Architecture
HBase API
Managing large data sets with Hbase
Using HBase in Hadoop applications
Overview Of Cloudera Manager and Hue
Oveview Of Impala
Overview of Oozie
Hands-on exercise
Practical Installation and Setup
Using Cloudera VMs

Optional

Spark Next Generation MapReduce Framework (3:00 hours)

What is Apache Spark
How Spark differs from Hadoop s Map Reduce
Spark Architecture
Spark Internals
RDD
Actions and Transformations
Hands-on Assignments

Important Note

Module Assignments
Individual assignments for each module will be assigned and evaluated
Final Assignments
POC based on a real time use case will be handled by candidates covering all the Hadoop components.
Data Sets will be included for all the assignments.

Key Notes
Support and assistance will be provided if candidates work either on Cloudera VMs or stand alone installation or any other open source softwares.
Complete guidance and support to all candidates (online/offline) 24*7.
Hadoop Interview based questions will be shared to the candidates once the course is completed. WhatsApp group will be created and interview questions, solutions and other updates will be shared and posted regularly among the candidates.
Training portions will be covered also based on Hadoop Certification Programs like Cloudera/HortonWorks/EMC

Gallery

Request Fees

Contact by Phone

+918041611647

Mode of Learning

Online

Facilities

Who can Attend?

Any UG or PG

Location

Tecfactory, Indira Nagar (Bangalore),Bangalore,IN

Get Directions

Apache Hadoop Online Intermediate Level by Tecfactory

Course Category

Request Call Back

Course Duration in Hours

Course Details

Gallery

Request Fees

Contact by Phone

Mode of Learning

Facilities

Who can Attend?

Location

Request Demo Class

You May Also Be Interested In

Apache Hadoop Online for freshers by BKV Learning Systems

2 months of paid in-house internship Program. Learning Track 01 Bigdata Analytics - Data Analytics Overview -…

Apache Hadoop Online for advanced learners by BKV Learning Systems

2 months of paid in-house internship Program. Learning Track 01 Bigdata Analytics - Data Analytics Overview -…

Apache Hadoop course for Working professional by VisionHook

Topic 1 # Introduction to Hadoop and Big-data Introduction to Big-data Introduction to…