Skip to main content

A python template

Project description

static_ondisk_kv

pypi Open In Colab Try it on gitpod

Simple and fast implementation of a static on disk kv, in python

Why this lib?

leveldb, rocksdb and lmdb all have issues for a static collections of key and values:

  • slow to build (many hours) : 3h for rocksdb compared to 1h for this lib (for a 5B collections for 1 long and 2 float16)
  • uses more space than necessary (100GB for rocksdb unlike 60GB)
  • as fast as this much simpler lib: about 5k sample/s on nvme drive

What this lib does not support:

  • non static collection
  • variable length values and keys

Install

pip install static_ondisk_kv

Python examples

Checkout these examples:

from static_ondisk_kv import OnDiskKV
from tqdm import tqdm
import random

kv = OnDiskKV(file='/media/nvme/mybigfile', key_format="q", value_format="ee")
print("length", kv.length)
k = kv.get_key(100)
v = kv.get_value(100)
print(k)
print(v)
print(kv[k])

API

OnDiskKV(file, key_format="q", value_format="ee")

Creates an ondisk kv from file using key_format and value_format for decoding.

get_key(i)

Returns the key at position i.

get_value(i)

Returns the value at position i.

getitem(k)

Returns the value for the key k

sort_parquet(input_collection, key_column, value_columns, output_folder)

sort parquet files of collection input_collection by key_column and writes to output_folder

parquet_to_file(input_collection, key_column, value_columns, output_file, key_format, value_format)

read parquet of sorted input_collection and writes to output_file the key and values using format key_format and value_format

For development

Either locally, or in gitpod (do export PIP_USER=false there)

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

to run tests:

pip install -r requirements-test.txt

then

make lint
make test

You can use make black to reformat the code

python -m pytest -x -s -v tests -k "dummy" to run a specific test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

static_ondisk_kv-1.1.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

static_ondisk_kv-1.1.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file static_ondisk_kv-1.1.1.tar.gz.

File metadata

  • Download URL: static_ondisk_kv-1.1.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for static_ondisk_kv-1.1.1.tar.gz
Algorithm Hash digest
SHA256 fcdd8e55480a74924ee137cea9b5bad5b3a49015fb513e7ede102229c7f157dd
MD5 306e0d5720f82da384595bf611bf0b21
BLAKE2b-256 1e773766312fed83053dc53ff82528005822c8dd51d1814b6e48bda88741be3d

See more details on using hashes here.

File details

Details for the file static_ondisk_kv-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for static_ondisk_kv-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2a8bca2056a95cb4b51129aa69e02d3be119c77c3a9278f521e991c51620fa58
MD5 368b72354847becdee0acf6cc5c11002
BLAKE2b-256 7dbfbbb44e2326e359b7ecaddd721f68c79eaf611709b33a7264dca0b17418be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page