Big Data-Hadoop

Hadoop / Big Data

Hadoop Developer / Analyst / SPARK + SCALA / Hadoop (Java + Non- Java) Track
Best Bigdata Hadoop Training with 2 Real-time Projects with 1 TB Data set
Duration of the Training : 8 to 10 weekends

How we are Different from Others : Covers each topics with Real Time Examples . Covers 8 Real time project and more than 72+ Assignments which is divided into Basic , Intermediate and Advanced . Trainer from Real Time Industry with 9 years experience in DWH. Working as BI and Hadoop consultant having 3+ years in Bigdata & Hadoop real time implementation and migrations.

This is completely hands own training , which covers 90% Practical And 10% Theory .Here in Radical Technologies , we will take all prerequisite like Java ,SQL, which is required to learn Hadoop Developer and Analytical skills. This way We will accommodate technology illiterate and Technical experts in the same session and at the end of the training , they will gain the confidence that , they got up-skilled to a different level.

8 Domain Based Project With Real Time Data
5 POC
72 Assignments
25 Real Time Scenarios On 16 Node Clusters
Smart Class
Basic Java
DWH Concept
Pig|Hive|Mapreduce|Nosql|Hbase|Zookeeper|Sqoop|Flume|Oozie|Yarn|Hue|Spark |Scala

42 Hours Classroom Section
30 Hours of assignments
25 hours for One Project and 50 Hrs for 2 Project
350+ Interview Questions
Administration and Manual Installation of Hadoop with other Domain based projects will be done on regular basis apart from our normal batch schedule .
We do have projects from Healthcare , Financial , Automotive ,Insurance , Banking , Retail etc , which will be given to our students as per their requirements .
Hadoop Certifications :Gyan Factory is accredited with Pearson Vue and Kriterion , We do conduct Exams in every month and we have 100% Passing record for all the students who completed course form Radical technologies . most demanding Hadoop Exams are Hortonworks and Cloudera certifications .
Exam Preparation : After the course We provide all of our candidates free exam preparation session , which will guide them to pass the Respective modules of Hadoop exams.

Registration Process : We never take any registration fee from the candidate without experiencing our training quality.Once you satisfied with the demo , you can register with full payment and avail discount . We have installment facility also.

For whom Hadoop is?
IT folks who want to change their profile in a most demanding technology which is in demand by almost all clients in all domains because of below mentioned reasons-

Hadoop is open source (Cost saving / Cheaper)
Hadoop solves Big Data problem which is very difficult or impossible to solve using highly paid tools in market
It can process Distributed data and no need to store entire data in centralized storage as it is there with other tools.
Now a days there is job cut in market in so many existing tools and technologies because clients are moving towards a cheaper and efficient solution in market named HADOOP
There will be almost 4.4 million jobs in market on Hadoop by next year.

Can I Learn Hadoop If I Don’t know Java?
Yes,
It is a big myth that if a guy don’t know Java then he can’t learn Hadoop. The truth is that Only Map Reduce framework needs Java except Map Reduce all other components are based on different terms like Hive is similar to SQL, HBase is similar to RDBMS and Pig is script based.
Only MR requires Java but there are so many organizations who started hiring on specific skill set also like HBASE developer or Pig and Hive specific requirements. Knowing MapReuce also is just like become all-rounder in Hadoop for any requirement.
Why Hadoop?

Solution for BigData Problem
Open Source Technology
Based on open source platforms
Contains several tool for entire ETL data processing Framework
It can process Distributed data and no need to store entire data in centralized storage as it is required for SQL based tools.

Training Syllabus ,
Big data

Distributed computing
Data management – Industry Challenges
Overview of Big Data
Characteristics of Big Data
Types of data
Sources of Big Data
Big Data examples
What is streaming data?
Batch vs Streaming data processing
Overview of Analytics
Big data Hadoop opportunities

Hadoop

Why we need Hadoop
Data centres and Hadoop Cluster overview
Overview of Hadoop Daemons
Hadoop Cluster and Racks
Learning Linux required for Hadoop
Hadoop ecosystem tools overview
Understanding the Hadoop configurations and Installation.

HDFS (Storage)

HDFS
HDFS Daemons – Namenode, Datanode, Secondary Namenode
Hadoop FS and Processing Environment’s UIs
Fault Tolerant
- High Availability
- Block Replication
How to read and write files
Hadoop FS shell commands

YARN (Hadoop Processing Framework)

YARN
YARN Daemons – Resource Manager, NodeManager etc.
Job assignment & Execution flow

Apache Hive

Data warehouse basics
OLTP vs OLAP Concepts
Hive
Hive Architecture
Metastore DB and Metastore Service
Hive Query Language (HQL)
Managed and External Tables
Partitioning & Bucketing
Query Optimization
Hiveserver2 (Thrift server)
JDBC , ODBC connection to Hive
Hive Transactions
Hive UDFs
Working with Avro Schema and AVRO file format

Apache Pig

Apache Pig
Advantage of Pig over MapReduce
Pig Latin (Scripting language for Pig)
Schema and Schema-less data in Pig
Structured , Semi-Structure data processing in Pig
Pig UDFs
HCatalog
Pig vs Hive Use case

Sqoop

Sqoop commands
Sqoop practical implementation
- Importing data to HDFS
- Importing data to Hive
- Exporting data to RDBMS
Sqoop connectors

Flume

Flume commands
Configuration of Source, Channel and Sink
Fan-out flume agents
How to load data in Hadoop that is coming from web server or other storage
How to load streaming data from Twitter data in HDFS using Hadoop

Oozie

Oozie
Action Node and Control Flow node
Designing workflow jobs
How to schedule jobs using Oozie
How to schedule jobs which are time based
Oozie Conf file

Scala

Scala
- Syntax formation, Datatypes , Variables
Classes and Objects
Basic Types and Operations
Functional Objects
Built-in Control Structures
Functions and Closures
Composition and Inheritance
Scala’s Hierarchy
Traits
Packages and Imports
Working with Lists, Collections
Abstract Members
Implicit Conversions and Parameters
For Expressions Revisited
The Scala Collections API
Extractors
Modular Programming Using Objects

Spark

Spark
Architecture and Spark APIs
Spark components
- Spark master
- Driver
- Executor
- Worker
- Significance of Spark context

Concept of Resilient distributed datasets (RDDs)
Properties of RDD
Creating RDDs
Transformations in RDD
Actions in RDD
Saving data through RDD
Key-value pair RDD

Invoking Spark shell
Loading a file in shell
Performing some basic operations on files in Spark shell

Spark application overview
Job scheduling process
DAG scheduler
RDD graph and lineage
Life cycle of spark application
How to choose between the different persistence levels for caching RDDs

Submit in cluster mode
Web UI – application monitoring
Important spark configuration properties

Spark SQL overview
Spark SQL demo
SchemaRDD and data frames
Joining, Filtering and Sorting Dataset
Spark SQL example program demo and code walk through