Introduction to Hadoop
- What is Big Data
- Need and significance of innovative technologies
- What is Hadoop
- 3 Vs (Characteristics)
- History of Hadoop and its Uses
- Different Components of Hadoop
- Various Hadoop Distributions
- Traditional Database vs Hadoop
HDFS ( Hadoop Distributed File System)
- Significance of HDFS in Hadoop
- HDFS Features
- Daemons of Hadoopand functionalities
- NameNode
- DataNode
- jobTracker
- TaskTrack
- Secondary NameNode
- Data Storage in HDFS
- Blocks
- Heartbeats
- Data Replication
- HDFS Federation
- High Availability
- Accessing HDFS
- CLI (Command Line Interface) Unix and Hadoop Commands
- Java Based Approach
- Data Flow
- Anatomy of a File Read
- Anatomy of a File Write
- Hadoop Archives
MapReduce
- Introduction to MapReduce
- NMapReduce Architecture
- MapReduce Programming Model
- MapReduce Algorithm and Phases
- Data Types
- Input Splits and Records
- Blocks Vs Splits
- Basic MapReduce Program
- Driver Code
- Mapper Code
- Reducer Code
- Combiner and Shuffler
- Creating Input and Output formats in MapReduce Jobs
- File Input / Output Format
- Text Input / Output Format
- Sequence File Input / Output Format,etc.
- Data Localization in
- MapReduce
- Distributed Cache
- A Sample Map reduce Program
- Identity Mapper
- IdentityReducer
Pig
- Introduction to Apache Pig
- MapReduce Vs. Apache Pig
- SQL Vs. Apache Pig
- Different Data types in Apache Pig
- Modes of Execution in Apache Pig
- Local Mode
- Map Reduce or Distributed Mode
- Execution Mechanism
- Grunt shell
- Script
- Embedded
- Data Processing Operators
- Loading and Storing Data
- Filtering Data
- Grouping and Joining Data
- Sorting Data
- Combining and Splitting Data
MapReduce
- How to write a simple PIG Script
- UDFs in PIG
Sqoop
- Introduction to Sqoop
- Sqoop Architecture and Internals
- MySQL client and server installation
- How to connect relational database using Sqoop
- Sqoop Commands
- Sqoop Commands
- Export
- HIVE imports
Hive
- The Metastore
- Comparison with Traditional Databases
- Schema on Read Versus Schema on Write
- Updates, Transactions, and Indexes
- HiveQL
- Data Types
- Operators and Functions
- Tables
- Managed Tables and External Tables
- Static Partitions and Dynamic Partitions
- Partitions and Buckets
- Storage Formats
- Importing Data
- Altering Tables
- Dropping Tables
- Querying Data
- Sorting and Aggregating
- Hive Query Language
- MapReduce Scripts
- Joins
- Subqueries
- Views
- User-Defined Functions
- Writing a UDF
- Writing a UDAF
- Limitations of Hive
- Hive vs Pig
HBase
- Introduction to Hbase
- HBaseVs HDFS
- Use Cases
- Basics Concepts
- Column families
- Scans
- Hbase Architecture
- Zoo Keeper
- SQL databases vs NoSQL databases
- Clients
- REST
- Thrift
- Java Based
- Avro
- MapReduce integration
- MapReduce over Hbase
- Schema definition
- Basic CRUD Operations)
Introduction to Flume
- Introduction to Flume
- Uses of Flume
- Flume Architecture
- Flume Master
- Flume Collectors
- Flume Agents
Oozie,HCatalog
- Introduction to Oozie
- Uses of Oozie
- Oozie workflow basics
Mahout
Introduction to Mahout
Sample Profiles will be provided on Big Data & Hadoop
Mock Interviews
- Introduction to R ( Analytical Tool)
- Introduction to Tableau (BI Tool)
- Project :
- Social Media Analytics
- (Twitter, Facebook Data Processions)
- Three Mock Interviews