HADOOP/BIG DATA :
Topics covered in this content :
1 .
Introduction
to Hadoop / Big Data
2 .
Advantages
of Hadoop / Big Data
3 .
Information
on Hadoop 2 – latest version of hadoop
4 .
Advantages
for Hadoop 2
1.
Introduction
to Hadoop / Big Data :
The Hadoop framework
itself is mostly written in the Java programming language, with some native code in C and command line utilities written as
shell-scripts. Any programming language can be used with "Hadoop
Streaming" to implement the "map" and "reduce" parts
of the user's program.
Big data/Hadoop is
among the most marketed trends in IT field right now, and Hadoop stands front
and center in the discussion of how to implement a big data strategy.
Hadoop base framework is composed of the following modules:
·
Hadoop Common – contains
libraries and utilities needed by other Hadoop modules;
·
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data providing
very high aggregate bandwidth across the cluster;
·
Hadoop YARN – a
resource-management platform responsible for managing compute resources in
clusters and using them for scheduling of users' applications; and
·
Hadoop MapReduce – a programming
model for large scale data processing.
Hadoop
splits files into large blocks (default 64MB or 128MB) and distributes the
blocks amongst the nodes in the cluster.
From its earliest
days, Hadoop has attracted software developers looking to create add-on tools
to fill in gaps in its functionality. Other supporting actors that have become
Hadoop subprojects or Apache projects in their own right include ; Cassandra, aNOSQL database; that helps in maintaining configuration data and
synchronizes distributed operations across clusters.
2 Advantages
of Hadoop/Big Data :
·
Hadoop has the ability
to process huge volumes of data in real time.
·
Hadoop's big data processing features are used
by people for learning new ways to mine data and discover new relationships in
their systems.
·
Hadoop plays
major role in big data management and analytics initiatives.
·
Hadoop is
very good at analysis of very huge, static unstructured data sets which include
many terabytes or even petabytes of
information
·
Because of its ability to handle data
"with very light structure, Hadoop
applications can take advantage of new information sources that don't
lend themselves to traditional databases.
3.
Information
on Hadoop 2 – latest version of Hadoop :
·
Hadoop’s
ongoing evolution : Hadoop is
continually evolving to meet shifting big data management needs and business
goals.
·
Hadoop 2
-- originally known as Hadoop 2.0 -- is entering the market. Main in Hadoop2 is
YARN, an overhauled resource
manager that allows applications other than MapReduce programs to work with
HDFS. By doing so, YARN (a good-natured acronym for Yet Another Resource
Negotiator) is meant to free Hadoop from its reliance on batch processing while
still providing backward compatibility with existing application programming
interfaces.
What is YARN ?
YARN is the key
difference for Hadoop 2.0 - it allows for multiple workloads to run
concurrently.
·
Hadoop 2,
will eventually take the framework far beyond its current core configuration.
·
It
relates the Hadoop Distributed File System (HDFS) with Java-based MapReduce
programs.
·
This
pairing is used by the early-adopter companies to help them deal with large
amounts of transaction data as well as various types of unstructured
and semi-structured data, including server and
network log files, sensor data, social media feeds, text documents and image
files.
4
Advantages
for Hadoop 2 :
·
Hadoop is
not useful in real-time analysis of live data sets -- although that could
change, thanks to the combination of Hadoop 2 and new
query engines recently introduced by some vendors looking to support ad
hoc analysis of Hadoop data.
·
Hadoop 2
also shows high availability improvements, through a new feature that allows
the users to create federated name (or master) node architecture in HDFS
instead of relying on a single node to control an entire cluster. Additionally,
it provides support for running Hadoop on Windows. Meanwhile, commercial
vendors are brewing up additional management-tool elixirs -- new job schedulers
and cluster provisioning software, for example -- in an effort to further boost
Hadoop's enterprise readiness.
Reasons to Learn Hadoop
1.
Better
Career :
2.
Better
Salary
3.
Better
Job Oppurtunities
4.
Big
Companies hiring