BigData – Hadoop – CtrIAI Software Technologies

BIG DATA & HADOOP

Introduction to Hadoop

What is Big Data
Need and significance of innovative technologies
What is Hadoop
3 Vs (Characteristics)
History of Hadoop and its Uses
Different Components of Hadoop
Various Hadoop Distributions
Traditional Database vs Hadoop

HDFS ( Hadoop Distributed File System)

Significance of HDFS in Hadoop
HDFS Features
Daemons of Hadoopand functionalities
NameNode
DataNode
jobTracker
TaskTrack
Secondary NameNode
Data Storage in HDFS
Blocks
Heartbeats
Data Replication
HDFS Federation
High Availability
Accessing HDFS
CLI (Command Line Interface) Unix and Hadoop Commands
Java Based Approach
Data Flow
Anatomy of a File Read
Anatomy of a File Write
Hadoop Archives

MapReduce

Introduction to MapReduce
NMapReduce Architecture
MapReduce Programming Model
MapReduce Algorithm and Phases
Data Types
Input Splits and Records
Blocks Vs Splits
Basic MapReduce Program
Driver Code
Mapper Code
Reducer Code
Combiner and Shuffler
Creating Input and Output formats in MapReduce Jobs
File Input / Output Format
Text Input / Output Format
Sequence File Input / Output Format,etc.
Data Localization in
MapReduce
Distributed Cache
A Sample Map reduce Program
Identity Mapper
IdentityReducer

Pig

Introduction to Apache Pig
MapReduce Vs. Apache Pig
SQL Vs. Apache Pig
Different Data types in Apache Pig
Modes of Execution in Apache Pig
Local Mode
Map Reduce or Distributed Mode
Execution Mechanism
Grunt shell
Script
Embedded
Data Processing Operators
Loading and Storing Data
Filtering Data
Grouping and Joining Data
Sorting Data
Combining and Splitting Data

MapReduce

How to write a simple PIG Script
UDFs in PIG

Sqoop

Introduction to Sqoop
Sqoop Architecture and Internals
MySQL client and server installation
How to connect relational database using Sqoop
Sqoop Commands
Sqoop Commands
Export
HIVE imports

Hive

The Metastore
Comparison with Traditional Databases
Schema on Read Versus Schema on Write
Updates, Transactions, and Indexes
HiveQL
Data Types
Operators and Functions
Tables
Managed Tables and External Tables
Static Partitions and Dynamic Partitions
Partitions and Buckets
Storage Formats
Importing Data
Altering Tables
Dropping Tables
Querying Data
Sorting and Aggregating
Hive Query Language
MapReduce Scripts
Joins
Subqueries
Views
User-Defined Functions
Writing a UDF
Writing a UDAF
Limitations of Hive
Hive vs Pig

HBase

Introduction to Hbase
HBaseVs HDFS
Use Cases
Basics Concepts
Column families
Scans
Hbase Architecture
Zoo Keeper
SQL databases vs NoSQL databases
Clients
REST
Thrift
Java Based
Avro
MapReduce integration
MapReduce over Hbase
Schema definition
Basic CRUD Operations)

Introduction to Flume

Introduction to Flume
Uses of Flume
Flume Architecture
Flume Master
Flume Collectors
Flume Agents

Oozie,HCatalog

Introduction to Oozie
Uses of Oozie
Oozie workflow basics

Mahout

Introduction to Mahout

Sample Profiles will be provided on Big Data & Hadoop

Mock Interviews

Introduction to R ( Analytical Tool)
Introduction to Tableau (BI Tool)
Project :
Social Media Analytics
(Twitter, Facebook Data Processions)
Three Mock Interviews