Hadoop:
Let’s have a look at Hadoop’s MapReduce
features :MapReduce can be considered as the soul of Hadoop . MapReduce is a king of programming parameter which permits huge amount of gullibility across thousands of servers in a Hadoop cluster. Users familiar to clustered processing solutions can easily understand the MapReduce concept .
New users would find it bit difficult to grasp as it is bit difficult and complicated to understand the clustered structure. People new to this can go throught the below information to undersand MapReduce.
MapReduce can be catagorized into two jobs executed by Hadoop Programs:
1.
Is the is the map job: Individual or single elements are splitted
into key pairs or value pairs. It takes a group of data and transforms it into
another set of data .
2.
Is the reduce job : The
output from the map is considered as input it combines those data of map job
into a small group of tuples. MapReduce- as the name indicates – first always
the map job is performed and then the reduce job.
MapReduce is a programming model and an associated implementation for
processing and designing a huge data sets with a distributed algorithm on a cluster.
·
A MapReduce program comprises of Map method which
executed the filtering function and sorting out of the data and a Reduce method that executes a summary function
. The "MapReduce System" or a framework executes the processing by collecting the distributed servers, running the varied
tasks in parallel, guiding all communications and data transfers between the
various parts of the system, and facilitating for redundancy and fault tolerance.
·
The model is inspired by the map and reduce functions commonly used in functional programming, though their intention in the MapReduce
framework is not the same as in their original forms.
·
Scalability and tolerance of faults are thee main
features of MapReduce program and not only the map and reduce functions , but
the scalability and fault-tolerance gained for a variety of applications by
optimizing the execution engine once. The actual use of MapReduce program can
be put into practice only when the above two parameters come into action.
For better MapReduce algorithm , it is essential to optimize the
communication cost. MapReduce libraries are designed in many programming languages,
with different levels of optimization. The most commonly used open source is Apache Hadoop.
MapReduce permits for distributed processing of the map and reduction
operations. Condition is that each mapping operation is not dependent on each
other, all maps can be executed in parallel .While this process can often
appear inefficient compared to algorithms that are more sequential (because
multiple rather than one instance of the reduction process must be run),
MapReduce can be used specifically for larger datasets than
"commodity" servers can handle – a huge data server farm can utilize
MapReduce to sort a petabyte of data in only in a
fraction of time. The parallelism also provides some possibility of recovering
from partial failure of servers or storage during the operation: if one mapper
or reducer fails, the work can be rescheduled – assuming the input data is
still available.
Uses :
1. It has distributed
pattern based search feature .
2. Reversal of web link
graph , decomposition of Singular value .
3. Distributed sorting.
4. Clustering of documents and translating the statistical translation
of machine .