Skip to main content

A Python-based Distributed Database

Project description

Sidewinder

sidewinder-ci Supported Python Versions PyPI version PyPI Downloads

Python-based Distributed Database

Note: Sidewinder is experimental - and is not intended for Production workloads.

Sidewinder is a Python-based (with asyncio) Proof-of-Concept Distributed Database that distributes shards of data from the server to a number of workers to "divide and conquer" OLAP database workloads.

It consists of a server, workers, and a client (where you can run interactive SQL commands).

Sidewinder will NOT distribute queries which do not contain aggregates - it will run those on the server side.

Sidewinder uses Apache Arrow with Websockets with TLS for secure communication between the server, worker(s), and client(s).

It uses DuckDB as its SQL execution engine - and the PostgreSQL parser to understand how to combine results from distributed workers.

Setup (to run locally)

Install package

You can install sidewinder-db from PyPi or from source.

Option 1 - from PyPi

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

pip install sidewinder-db

Option 2 - from source - for development

git clone https://github.com/prmoore77/sidewinder

cd sidewinder

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel

# Install Sidewinder-DB - in editable mode with dev dependencies
pip install --editable .[dev]

Note

For the following commands - if you running from source and using --editable mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:

export PYTHONPATH=$(pwd)/src

Bootstrap the environment by creating a security user list (password file), TLS certificate keypair, and a sample TPC-H dataset with 11 shards

(The passwords shown are just examples, it is recommended that you use more secure passwords)

. .venv/bin/activate
sidewinder-bootstrap \
    --client-username=scott \
    --client-password=tiger \
    --worker-password=united \
    --tpch-scale-factor=1 \
    --shard-count=11

Run sidewinder locally - from root of repo (use --help option on the executables below for option details)

1) Server:

Open a terminal, then:

. .venv/bin/activate
sidewinder-server

2) Worker:

Open another terminal, then start a single worker (using the same worker password you used in the bootstrap command above) with command:

. .venv/bin/activate
sidewinder-worker --tls-roots=tls/server.crt --password=united
Note: you can run up to 11 workers for this example configuration, to do that do this instead of starting a single-worker:
. .venv/bin/activate
for x in {1..11}:
do
  sidewinder-worker --tls-roots=tls/server.crt --password=united &
done

To kill the workers later - run:

kill $(jobs -p)

3) Client:

Open another terminal, then connect with the client - using the same client username/password you used in the bootstrap command above:

. .venv/bin/activate
sidewinder-client --tls-roots=tls/server.crt --username=scott --password=tiger
Then - while in the client - you can run a sample query that will distribute to the worker(s) (if you have at least one running) - example:

SELECT COUNT(*) FROM lineitem;

Note: if you are running less than 11 workers - your answer will only reflect n/11 of the data (where n is the worker count). We will add delta processing at a later point...
A query that won't distribute (because it does not contain aggregates) - would be:

SELECT * FROM region;

or:

SELECT * FROM lineitem LIMIT 5;

Note: there are TPC-H queries in the tpc-h_queries folder you can run...
To turn distributed mode OFF in the client:

.set distributed = false;

To turn summarization mode OFF in the client (so that sidewinder does NOT summarize the workers' results - this only applies to distributed mode):

.set summarize = false;

Optional DuckDB CLI (use for data QA purposes, etc.)

Install DuckDB CLI version 1.0.0 - and make sure the executable is on your PATH.

Platform Downloads:
Linux x86-64
Linux arm64 (aarch64)
MacOS Universal

Handy development commands

Version management

Bump the version of the application - (you must have installed from source with the [dev] extras)
bumpver update --patch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sidewinder_db-0.0.71.tar.gz (37.5 kB view hashes)

Uploaded Source

Built Distribution

sidewinder_db-0.0.71-py3-none-any.whl (42.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page