Big Data


Quad Core Processor (VT-x or AMD-V support recommended), 30GB Storage, 8GB RAM


Linux OS, Hadoop & its Ecosystem tools

Course Duration

30 hours

Big Data

₹ 18000 ₹14500

What Will I Learn?

  • Introduction to big data
  • Common big data domain challenges
  • Limitations of traditional solutions
  • What is Hadoop?
  • Hadoop Architecture
  • Hadoop 1.0 and Hadoop 2.x ecosystem and its Core Components
  • Hadoop Architecture
  • Hadoop components
  • Distributed File System
  • MapReduce Framework
  • YARN
  • Replication Rules
  • Rack Awareness Theory
  • Virtual Machine
  • Linux OS
  • Linux File System
  • Pre-requisite for Installing Hadoop
  • Installation & Configuration of Hadoop Cluster
  • Hadoop in a Pseudo-Distributed mode
  • Hadoop Configuration Files
  • Log Files in Hadoop
  • Deploying a multi-node Hadoop cluster
  • Understand HDFS
  • Different phases in MapReduce
  • Application Workflow in YARN
  • YARN Metrics
  • Explain Hive
  • Hive Setup
  • Hive Configuration
  • Working with Hive
  • Pig setup
  • Working with Pig
  • What is NoSQL Database
  • Difference between SQL and NoSQL
  • HBase data model
  • HBase Architecture
  • MemStore, WAL, BlockCache
  • HBase Hfile
  • Compactions
  • HBase Read and Write
  • HBase balancer and hbck
  • HBase setup
  • Working with HBase
  • Installing Zookeeper
  • Sqoop Architecture
  • Sqoop installation and configuration
  • Import data from RDBMS into HDFS
  • Flume architecture
  • Flume installation and configuration
  • Ingest data from External Sources
  • Kafka Installation
  • Import data using Kafka
  • Best practices
  • Spark Installation
  • Spark Architecture
  • Building Blocks of Spark
  • Libraries to work with Spark (pyspark, spark-submit)
  • Spark Components (Unified Stack)
  • Spark Core
  • Spark SQL
  • Spark Streaming
  • Spark MLlib
  • GraphX
What Will I Learn?

Why Take This Course?

Many people talk about Big Data and Hadoop but how many of them actually understand it?

Big Data Hadoop and its EcoSystem is very vast and intimidating. Hundreds of different projects form the Hadoop ecosystem, together these projects, predominantly open source are used to build Data pipelines in Big Data environment.

The Program for Big Data Hadoop developers will upskill you with tools like Apache Hadoop, its Architecture, MapReduce Framework, HDFS, YARN, Pig, Hive, Zookeeper, HBase, Spark, etc., by providing hands-on experience in all the aforementioned technologies. Big Data Analytics program will enable you to Develop and implement Hadoop applications, including setting up a cluster environment. Load datasets to work with Big Data. Data Warehousing and ETL tools are used to process the Big Data resorting in a Distributed file system. Building applications using super-fast Apache Spark. Data Ingestion tool(s) Sqoop is explained to move data from traditional RDBMS databases (MySQL) to Big Data (HDFS, HBase).

The Program treats the enrolled candidates as a novice and starts right from the basics of Linux OS to lay a sound foundation to enhance their skills in Big Data environment. By the end of this program, one would get a clear idea about the concepts of Big Data and Hadoop. This program is a foundation program to help you begin your career in Big Data environment by Administering the Hadoop Cluster or Developing MapReduce Applications over Hadoop Cluster. Happy Learning!


Enroll Now

  Enroll Now
Copyrights © Instilit.
Call Us