Skip to main content

A library for modelling, analyzing, and optimizing quorum systems

Project description

Quorums

Installation

TODO(mwhittaker): Make this package pip'able. For now, you have to clone and install the dependencies yourself:

pip install -r requirements.txt

Tutorial

Given a set of nodes X, a read-write quorum system is a pair (R, W) where R is a set of subsets of X called read quorums and W is a set of subsets of X called write quorums. A read-write quorum system satisfies the property that every read quorum intersects every write quorum. This library allows us to construct and analyze arbitrary read-write quorum systems. First, we import the library.

from quorums import *

Next, we specify the nodes in our quorum system. Our nodes can be strings, integers, IP addresses, anything!

a = Node('a')
b = Node('b')
c = Node('c')
d = Node('d')
e = Node('e')
f = Node('f')

Here, we construct a two by three grid of nodes. Every row is read quorum, and one element from every row is a write quorum. Note that when we construct a quorum system, we only have to specify the set of read quorums. The library figures out the optimal set of write quorums automatically.

grid = QuorumSystem(reads=a*b*c + d*e*f)

This prints {'a', 'b', 'c'} and {'d', 'e', 'f'}.

for r in grid.read_quorums():
    print(r)

This prints {'a', 'd'}, {'a', 'e'}, {'b', 'f'}, {'b', 'd'}, ...

for w in grid.write_quorums():
    print(w)

Alternatively, we could specify the write quorums...

QuorumSystem(writes=(a + b + c) * (d + e + f))

or both the read and write quorums.

QuorumSystem(reads=a*b*c + d*e*f, writes=(a + b + c) * (d + e + f))

We can check whether a given set is a read or write quorum. Note that any superset of a quorum is also considered a quorum.

grid.is_read_quorum({'a', 'b', 'c'})       # True
grid.is_read_quorum({'a', 'b', 'c', 'd'})  # True
grid.is_read_quorum({'a', 'b', 'd'})       # False

grid.is_write_quorum({'a', 'd'})      # True
grid.is_write_quorum({'a', 'd', 'd'}) # True
grid.is_write_quorum({'a', 'b'})      # False

The read resilience of our quorum system is the largest number f such that despite the failure of any f nodes, we still have at least one read quorum. Write resilience is defined similarly, and resilience is the minimum of read and write resilience.

grid.read_resilience()  # 1
grid.write_resilience() # 2
grid.resilience()       # 1

A strategy is a discrete probability distribution over the set of read and write quorums. A strategy gives us a way to pick quorums at random. The load of a node is the probability that the node is selected by the strategy, and the load of a strategy is the load of the most heavily loaded node. Using the strategy method, we get a load-optimal strategy, i.e. the strategy with the lowest possible load.

Typically in a distributed system, a read quorum of nodes is contacted to perform a read, and a write quorum of nodes is contacted to perform a write. Though we get to pick a strategy, we don't get to pick the fraction of operations that are reads and the fraction of operations that are writes. This is determined by the workload. When constructing a strategy, we have to specify the workload. The returned strategy is optimal only against this workload. Here, we construct a strategy assuming that 75% of all operations are reads.

strategy = grid.strategy(read_fraction=0.75)

We can use the strategy to sample read and write quorums.

print(strategy.get_read_quorum())
print(strategy.get_read_quorum())
print(strategy.get_read_quorum())
print(strategy.get_write_quorum())
print(strategy.get_write_quorum())
print(strategy.get_write_quorum())

We can query the strategy's load.

strategy.load(read_fraction=0.75) # 0.458

We can query the strategy's load on other workloads as well, though the strategy may not be optimal.

strategy.load(read_fraction=0)   # 0.333
strategy.load(read_fraction=0.5) # 0.416
strategy.load(read_fraction=1)   # 0.5

This is a shorthand for grid.strategy(read_fraction=0.25).load(read_fraction=0.25).

grid.load(read_fraction=0.25) # 0.375

In the real world, we don't often have a fixed workload. Workloads change over time. Instead of specifying a fixed read fraction, we can provide a discrete probability distribution of read fractions. Here, we say that the read fraction is 10% half the time and 75% half the time. strategy will return the strategy that minimizes the expected load according to this distribution.

distribution = {0.1: 0.5, 0.75: 0.5}
strategy = grid.strategy(read_fraction=distribution)
strategy.load(read_fraction=distribution) # 0.404

We can also specify the write fraction instead of the read fraction, if we prefer.

strategy = grid.strategy(write_fraction=0.75)
strategy.load(write_fraction=distribution) # 0.429

In the real world, not all nodes are equal. We often run distributed systems on heterogenous hardware, so some nodes might be faster than others. To model this, we instatiate every node with its capacity. Here, nodes a, c, and e can process 1000 commands per second, while nodes b, d, and f can only process 500 requests per second.

a = Node('a', capacity=1000)
b = Node('b', capacity=500)
c = Node('c', capacity=1000)
d = Node('d', capacity=500)
e = Node('e', capacity=1000)
f = Node('f', capacity=500)

Now, load can be interpreted as the inverse of the peak throughput of the quorum system. We can also call capacity to get this inverse directly. Here, our quorum system is capable of processing 1333 commands per second for a workload of 75% reads.

grid = QuorumSystem(reads=a*b*c + d*e*f)
strategy = grid.strategy(read_fraction=0.75)
strategy.load(read_fraction=0.75)     # 0.00075
strategy.capacity(read_fraction=0.75) # 1333

Nodes might also process reads and writes at different speeds. We can specify the peak read and write throughput of every node separately. Here, we assume reads are ten times as fast as writes.

a = Node('a', write_capacity=1000, read_capacity=10000)
b = Node('b', write_capacity=500, read_capacity=5000)
c = Node('c', write_capacity=1000, read_capacity=10000)
d = Node('d', write_capacity=500, read_capacity=5000)
e = Node('e', write_capacity=1000, read_capacity=10000)
f = Node('f', write_capacity=500, read_capacity=5000)

With 100% reads, our quorum system can process 10,000 commands per second. This throughput decreases as we increase the fraction of writes.

grid = QuorumSystem(reads=a*b*c + d*e*f)
grid.capacity(read_fraction=1)   # 10,000
grid.capacity(read_fraction=0.5) # 3913
grid.capacity(read_fraction=0)   # 2000

Another real world complication is the fact that machines sometimes fail and are sometimes slow. If we contact a quorum of nodes, some of them may fail, and we'll get stuck waiting to hear back from them. Or, some of them may be stragglers, and we'll wait longer than we'd like. We can address this problem by contacting more than the bare minimum number of nodes.

Formally, we say a read quorum (or write quorum) q is f-resilient if despite the failure of any f nodes, q still forms a read quorum (or write quorum). A strategy is f-resilient if it only selects f-resilient quorums. By default, strategy returns 0-resilient quorums. We can pass in the f argument to get more resilient strategies.

strategy = grid.strategy(read_fraction=0.5, f=1)

These sets are quorums even if 1 machine fails.

strategy.get_read_quorum()
strategy.get_write_quorum()

Putting everything together, we can use this library to pick quorum systems that are well suited to our workload. For example, say we're implementing a distributed file system and want to pick a 5 node quorum system with a resilience of 1 that has a good load on workloads that are 90% reads 90% of the time and 10% reads 10% of the time. We can try out three quorum systems: a simple majority quorum system, a crumbling walls quorum system, and a paths quorum system.

simple_majority = QuorumSystem(reads=majority([a, b, c, d, e]))
crumbling_walls = QuorumSystem(reads=a*b + c*d*e)
paths = QuorumSystem(reads=a*b + a*c*e + d*e + d*c*b)

We make sure we have the desired resilience.

assert(simple_majority.resilience() >= 1)
assert(crumbling_walls.resilience() >= 1)
assert(paths.resilience() >= 1)

We check the loads and see that the crumbling walls quorum system has the highest load, so we use the crumbling walls quorum system to implement our file system.

distribution = {0.9: 0.9, 0.1: 0.1}
simple_majority.capacity(read_fraction=distribution) # 5089
crumbling_walls.capacity(read_fraction=distribution) # 5837
paths.capacity(read_fraction=distribution)           # 5725

Maybe some time later, we experiencing high latency because of stragglers and want to switch to a 1-resilient strategy. We again compute the loads, but now see that the simple majority quorum system has the highest load, so we switch from the crumbling walls quorum system to the simple majority quorum system.

simple_majority.capacity(read_fraction=distribution, f=1) # 3816
crumbling_walls.capacity(read_fraction=distribution, f=1) # 1908
paths.capacity(read_fraction=distribution, f=1)           # 1908

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quoracle-0.0.1.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

quoracle-0.0.1-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file quoracle-0.0.1.tar.gz.

File metadata

  • Download URL: quoracle-0.0.1.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/36.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.2

File hashes

Hashes for quoracle-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b3ebc4aa3cfb1c503df0b6cf82053b2bf1c90a4264a01958b325a380252ff468
MD5 f12da80b579c70ff33d7424684b724a2
BLAKE2b-256 fc1bf4038479d9586077fb8f338b95d462943f6783df22581d9055d340404e63

See more details on using hashes here.

File details

Details for the file quoracle-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: quoracle-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/36.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.2

File hashes

Hashes for quoracle-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8ce3e0a16dda06bd815c6fc962cc3991b201a8bbd8b0bec2172ada25636e665a
MD5 88021e3cfca7abebe92af81048433ceb
BLAKE2b-256 e3909702ae5a556f046ebbbebc80301c0b365f9a322801630352d1e6b319e6f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page