Skip to main content

Distributed key-value storage in Python stdlib

Project description

Pyshard

Build Status Code Coverage Status PyPI

Pyshard is a complete distributed key-value data storage written in Python using only standard library tools. Pyshard's using hash based sharding method. It means that shard of value you write will be selected in accordance to key hash (regards to lgiordani/pyshard). This project is experimental and should be used in another project pdx - distributed web indexing service.

Installation

pip install pyshard

Quick start

Bootstrap

To run 'hello world' service you need started up shard servers. For example:

# test_server.py

import sys
import asyncio
from pyshard import ShardServer

if __name__ == '__main__':
    loop = asyncio.get_event_loop()

    server = ShardServer(host=sys.argv[1], port=int(sys.argv[2]), start=.0, end=1.0)
    try:
        loop.run_until_complete(server._do_run())
    finally:
        loop.close()
python test_server.py localhost 5050 & \
python test_server.py localhost 5051

After servers started up you should start bootstrap server to map shards. Now bootstrap server needs config file with shard's markers:

{
  "shards": [
    {
      "name": "shard0-0.5",
      "start": 0.0,
      "end": 0.5,
      "size": 1024,
      "host": "127.0.0.1",
      "port": 5050
    },
    {
      "name": "shard0.5-1",
      "start": 0.5,
      "end": 1.0,
      "size": 1024,
      "host": "127.0.0.1",
      "port": 5051
    }
  ]
}

Every shard has next parameters: name - unique string name of shard, start and end - numeric limits of key hash, size - memory limit for this shard, host and port - shard address. start and end limit means that this shard will store values with key hash in range [start, end].

# test_bootstrap_server.py

import asyncio

from pyshard import BootstrapServer

from pyshard.settings import settings


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    server = BootstrapServer(*settings.BOOTSTRAP_SERVER, config_path='config_example.json',
                             buffer_size=1024, loop=loop)
    try:
        loop.run_until_complete(server._do_run())
    finally:
        loop.close()
python test_bootstrap_server.py

Now shards have got configurations from bootstrap service and ready.

App

>>> from pyshard import Pyshard
>>> from pyshard.settings import settings
>>> 
>>> app = Pyshard(bootstrap_server=settings.BOOTSTRAP_SERVER)
>>> app.create_index('test_index')
>>> app.write(index='test_index', key='test', doc='hello world')
60
>>> app.read(index='test_index', key='test')
{'hash_': 0.1671936, 'record': 'hello world'}
>>> app.write('test_index', 'test1', {'hello': 'world'})
54
>>> app.read('test_index', 'test')
{'hash_': 0.8204544, 'record': {'hello': 'world'}}
>>> app.pop('test_index', 'test1')
{'hash_': 0.8204544, 'record': {'hello': 'world'}}

Utilities

Since version 0.2.0 Pyshard has several console utilities. They are made to simplify some operations like cat or massive write.

Let's make file with data. Row format: {key}|{value}:

printf '1|test\n2|{"test": "test"}\n3|42\n4|0.9\n' > test_write.txt

We can add this rows to storage using pyshard write command.

cat test_write.txt | pyshard write test_index --force

--force oprion for creating index test_index if it does not exist

So let's cat storage with index test_index:

pyshard cat test_index

Command will log results to stdout:

2|{"hash_": 0.2258304, "record": {"test": "test"}}
3|{"hash_": 0.1904896, "record": 42}
1|{"hash_": 0.8102784, "record": "test"}
4|{"hash_": 0.7252864, "record": 0.9}

TODO

  • Index (data tables equivalent)
  • Connection id for shard servers (now it is an address)
  • App utils (pyshard read, pyshard write)
  • Nice run methods for services
  • Makefile

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyshard-0.2.3.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

pyshard-0.2.3-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file pyshard-0.2.3.tar.gz.

File metadata

  • Download URL: pyshard-0.2.3.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for pyshard-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8742d5d32440f1ed2c40d0e16919c9c8d043f1ca4df3e9dd679d2c6424bf564a
MD5 82a44da1afbe4c5cf0611379f1a9067b
BLAKE2b-256 74b3f2d5d5cd9f0a92509ed0b7e3adfe0aed56e414bc9fd3631cc3adfab3c621

See more details on using hashes here.

File details

Details for the file pyshard-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pyshard-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for pyshard-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e4f356e8d9c2a4e5ffc3c79c31bbd72ca8a33bc0219f3b8bf9bf9ceeab088597
MD5 775d8c289f82f520623a5132ba9d8218
BLAKE2b-256 612f7a146f6323d0211377f81a93d464612447545528ba190c62c1995b98441c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page