A python template
Project description
static_ondisk_kv
Simple and fast implementation of a static on disk kv, in python
Why this lib?
leveldb, rocksdb and lmdb all have issues for a static collections of key and values:
- slow to build (many hours) : 3h for rocksdb compared to 1h for this lib (for a 5B collections for 1 long and 2 float16)
- uses more space than necessary (100GB for rocksdb unlike 60GB)
- as fast as this much simpler lib: about 5k sample/s on nvme drive
What this lib does not support:
- non static collection
- variable length values and keys
Install
pip install static_ondisk_kv
Python examples
Checkout these examples:
from static_ondisk_kv import OnDiskKV
from tqdm import tqdm
import random
kv = OnDiskKV(file='/media/nvme/mybigfile', key_format="q", value_format="ee")
print("length", kv.length)
k = kv.get_key(100)
v = kv.get_value(100)
print(k)
print(v)
print(kv[k])
API
OnDiskKV(file, key_format="q", value_format="ee")
Creates an ondisk kv from file
using key_format
and value_format
for decoding.
get_key(i)
Returns the key at position i.
get_value(i)
Returns the value at position i.
getitem(k)
Returns the value for the key k
sort_parquet(input_collection, key_column, value_columns, output_folder)
sort parquet files of collection input_collection
by key_column
and writes to output_folder
parquet_to_file(input_collection, key_column, value_columns, output_file, key_format, value_format)
read parquet of sorted input_collection
and writes to output_file
the key and values using format key_format
and value_format
For development
Either locally, or in gitpod (do export PIP_USER=false
there)
Setup a virtualenv:
python3 -m venv .env
source .env/bin/activate
pip install -e .
to run tests:
pip install -r requirements-test.txt
then
make lint
make test
You can use make black
to reformat the code
python -m pytest -x -s -v tests -k "dummy"
to run a specific test
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for static_ondisk_kv-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 311f953c53bbf41e5736a8714b016be3dde6e0aaad6849aa4913e85e03d192a9 |
|
MD5 | 8a7ea5a22bb0c36ea943718de683c89c |
|
BLAKE2b-256 | d730d3912bd4e0f7c8d70bad458774ac2439a1935c54bed9604bdbe4314e77d5 |