[Home]

P2P-Tuple

P2P-Tuple is a middleware for supporting distributed computing on the Internet. It solves an interesting problem: given X units of jobs and Y unreliable computing nodes, how to process and eventually complete these X jobs when all Y computing nodes could potentially become unavailable at any time.

P2P-Tuple supports both proprietary and Google's MapReduce programming models.

P2P-Tuple Overview

P2P-Tuple features a P2P architecture and thus there is no centralized server when compared to other systems like BOINC or Google's MapReduce:

Each machine in P2P-Tuple is:

  • a storage node that stores input and output data.
  • a computation node that processes the input data and generates outputs.
  • a manager node that indexes data, schedules jobs.
On each machine, the software follows a simple modular design:

P2P-Tuple's core design is detailed in a paper. The Google MapReduce support is added on top of the core design as an extension. P2P-Tuple's MapReduce implementation itself is heavily influenced by the early version of Hadoop but completely decentralized.

Technologies

  • All resources are indexed by FreePastry DHT.
  • Fully distributed group membership management.
  • Erasure code and replication for high data availability.
  • Cross platform C++/Boost (Linux and Windows).
  • Extensive performance evaluation on a Could platform.