Description:
With the enormous increase in data there has been an urgent requirement to process these data in a effective manner.These frameworks take the huge blocks of data and convert them into simple key value pair and make them easy and modular to analyze. The interesting part in their implementation is that the framework is capable of performing these task parallel on multiple Nodes and hence balancing the load to reduce overhead on any single node.
Why Iterative
The MapReduce framework like Hadoop and Dryad has been very successful in fulfilling the need of the people to analyze huge files and compute data intensive problems. Although it takes care of many problems but many data analysis techniques require iterative computations, including PageRank , HITS (Hypertext-Induced Topic Search) , recursive relational queries, clustering, neural-network analysis, social network analysis, and network traffic analysis.
These techniques have a common trait: data are processed iteratively until the computation satisfies a convergence or stopping condition. Most of the iterative algorithm are run once and then output is operated with initial output to generate the required result. This type of program terminates only when fixed output is reached i.e the result does not changes from one iteration to another.
The MapReduce framework does not directly support these iterative data analysis applications. Instead, programmers must implement iterative programs by manually issuing multiple MapReduce jobs and orchestrating their execution using a driver program . in which the data flow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs.
With the enormous increase in data there has been an urgent requirement to process these data in a effective manner.These frameworks take the huge blocks of data and convert them into simple key value pair and make them easy and modular to analyze. The interesting part in their implementation is that the framework is capable of performing these task parallel on multiple Nodes and hence balancing the load to reduce overhead on any single node.
Why Iterative
The MapReduce framework like Hadoop and Dryad has been very successful in fulfilling the need of the people to analyze huge files and compute data intensive problems. Although it takes care of many problems but many data analysis techniques require iterative computations, including PageRank , HITS (Hypertext-Induced Topic Search) , recursive relational queries, clustering, neural-network analysis, social network analysis, and network traffic analysis.
These techniques have a common trait: data are processed iteratively until the computation satisfies a convergence or stopping condition. Most of the iterative algorithm are run once and then output is operated with initial output to generate the required result. This type of program terminates only when fixed output is reached i.e the result does not changes from one iteration to another.
The MapReduce framework does not directly support these iterative data analysis applications. Instead, programmers must implement iterative programs by manually issuing multiple MapReduce jobs and orchestrating their execution using a driver program . in which the data flow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs.