[Home]
P2P-Tuple
P2P-Tuple is a middleware for supporting distributed computing on the Internet. It solves an interesting problem:
given X units of jobs and Y unreliable computing nodes, how to process and eventually complete
these X jobs when all Y computing nodes could potentially become unavailable at any time.
P2P-Tuple supports both proprietary and Google's MapReduce
programming models.
P2P-Tuple Overview
P2P-Tuple features a P2P architecture and thus there is no centralized server when compared to other systems like
BOINC or Google's MapReduce:
Each machine in P2P-Tuple is:
- a storage node that stores input and output data.
- a computation node that processes the input data and generates outputs.
- a manager node that indexes data, schedules jobs.
On each machine, the software follows a simple modular design:
P2P-Tuple's core design is detailed in a paper. The Google MapReduce support is added
on top of the core design as an extension. P2P-Tuple's MapReduce implementation itself is heavily influenced by
the early version of Hadoop but completely decentralized.
Technologies
- All resources are indexed by FreePastry DHT.
- Fully distributed group membership management.
- Erasure code and replication for high data availability.
- Cross platform C++/Boost (Linux and Windows).
- Extensive performance evaluation on a Could platform.
|