Flexible Paxos: An Industry Perspective

June 18, 2017

Background

In June 2017, I completed my bachelor in Software Engineering at Blekinge Institute of Technology. For my thesis, I collaborated with Ericsson AB. I had expressed my wish to do something in the area of distributed systems. After some discussion about ideas for the thesis, my supervisors brought the Flexible Paxos paper to my attention. I was very intrigued by Flexible Paxos as it is cutting-edge work in the area of distributed consensus.

Thesis

Paxos is an algorithm for implementing fault-tolerant distributed systems. In late 2016, the Flexible Paxos paper was published. The authors make the observation that a restriction in the Paxos algorithm can safely be removed. This allows end users to have more control over the choice between performance or availability. In this paper, l look at three different distributed services: ZooKeeper, Etcd and Consul, and evaluate which one is most suitable for an adaptation of Flexible Paxos. I concluded that ZooKeeper is the best fit and successfully modified a ZooKeeper implementation to support Flexible Paxos. In the results section, I demonstrate the advantages by comparing the latency and throughput of requests of the original version and the prototype, and we see that the overall performance is improved. Zookeeper is used a lot in the industry, some examples of companies that use it are: Facebook, Yahoo, Rackspace and Ericsson. Also, popular open source projects such as Mesos, Spark and Kafka rely on ZooKeeper for coordination.

This work is a contribution to the community as the prototype is released as open source on Github. As far as I know, this is the first publicly released work on adding Flexible Paxos to ZooKeeper.

Github repository of the ZooKeeper fork

The full paper can be found here.

Discussion, links, and tweets

PhD student at KTH Royal Institute of Technology.