Skip to main content

A python template

Project description

static_ondisk_kv

pypi Open In Colab Try it on gitpod

Simple and fast implementation of a static on disk kv, in python

Why this lib?

leveldb, rocksdb and lmdb all have issues for a static collections of key and values:

  • slow to build (many hours) : 3h for rocksdb compared to 1h for this lib (for a 5B collections for 1 long and 2 float16)
  • uses more space than necessary (100GB for rocksdb unlike 60GB)
  • as fast as this much simpler lib: about 5k sample/s on nvme drive

What this lib does not support:

  • non static collection
  • variable length values and keys

Install

pip install static_ondisk_kv

Python examples

Checkout these examples:

from static_ondisk_kv import OnDiskKV
from tqdm import tqdm
import random

kv = OnDiskKV(file='/media/nvme/mybigfile', key_format="q", value_format="ee")
print("length", kv.length)
k = kv.get_key(100)
v = kv.get_value(100)
print(k)
print(v)
print(kv[k])

API

OnDiskKV(file, key_format="q", value_format="ee")

Creates an ondisk kv from file using key_format and value_format for decoding.

get_key(i)

Returns the key at position i.

get_value(i)

Returns the value at position i.

getitem(k)

Returns the value for the key k

sort_parquet(input_collection, key_column, value_columns, output_folder)

sort parquet files of collection input_collection by key_column and writes to output_folder

parquet_to_file(input_collection, key_column, value_columns, output_file, key_format, value_format)

read parquet of sorted input_collection and writes to output_file the key and values using format key_format and value_format

For development

Either locally, or in gitpod (do export PIP_USER=false there)

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

to run tests:

pip install -r requirements-test.txt

then

make lint
make test

You can use make black to reformat the code

python -m pytest -x -s -v tests -k "dummy" to run a specific test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

static_ondisk_kv-1.1.2.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

static_ondisk_kv-1.1.2-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file static_ondisk_kv-1.1.2.tar.gz.

File metadata

  • Download URL: static_ondisk_kv-1.1.2.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for static_ondisk_kv-1.1.2.tar.gz
Algorithm Hash digest
SHA256 0bdbe46080cf5069eaaf7a51490052076706c03f59d6bc9d489774632e29155b
MD5 492e43b7beb4e601928153483c88c3b5
BLAKE2b-256 10e8764716680ee9582f8e72938f1cae752adc116db608ad6000c198e08ef6e8

See more details on using hashes here.

File details

Details for the file static_ondisk_kv-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for static_ondisk_kv-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0a9bfe32a1a284aa93e84177addd3de211ff12185dffeb66560b39db98778dd2
MD5 96541fdef4696a8211ae988e18b7839a
BLAKE2b-256 d441b08bcfcc728aef0475a886f0eb1de1675e12223c4a12c9be428e3f57dcf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page