Skip to main content

A python template

Project description

static_ondisk_kv

pypi Open In Colab Try it on gitpod

Simple and fast implementation of a static on disk kv, in python

Why this lib?

leveldb, rocksdb and lmdb all have issues for a static collections of key and values:

  • slow to build (many hours) : 3h for rocksdb compared to 1h for this lib (for a 5B collections for 1 long and 2 float16)
  • uses more space than necessary (100GB for rocksdb unlike 60GB)
  • as fast as this much simpler lib: about 5k sample/s on nvme drive

What this lib does not support:

  • non static collection
  • variable length values and keys

Install

pip install static_ondisk_kv

Python examples

Checkout these examples:

from static_ondisk_kv import OnDiskKV
from tqdm import tqdm
import random

kv = OnDiskKV(file='/media/nvme/mybigfile', key_format="q", value_format="ee")
print("length", kv.length)
k = kv.get_key(100)
v = kv.get_value(100)
print(k)
print(v)
print(kv[k])

API

OnDiskKV(file, key_format="q", value_format="ee")

Creates an ondisk kv from file using key_format and value_format for decoding.

get_key(i)

Returns the key at position i.

get_value(i)

Returns the value at position i.

getitem(k)

Returns the value for the key k

sort_parquet(input_collection, key_column, value_columns, output_folder)

sort parquet files of collection input_collection by key_column and writes to output_folder

parquet_to_file(input_collection, key_column, value_columns, output_file, key_format, value_format)

read parquet of sorted input_collection and writes to output_file the key and values using format key_format and value_format

For development

Either locally, or in gitpod (do export PIP_USER=false there)

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

to run tests:

pip install -r requirements-test.txt

then

make lint
make test

You can use make black to reformat the code

python -m pytest -x -s -v tests -k "dummy" to run a specific test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

static_ondisk_kv-1.0.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

static_ondisk_kv-1.0.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file static_ondisk_kv-1.0.0.tar.gz.

File metadata

  • Download URL: static_ondisk_kv-1.0.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.12

File hashes

Hashes for static_ondisk_kv-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4dcb45fc68d5412e56b9000a42bd5c371fe891d9e75b052c7205c1b6f05d3c12
MD5 7646daac528c32da36f603de4c1cada3
BLAKE2b-256 d5dadc13f96c874d6694558424f79ea536eca555ab33712f4fab495abab26196

See more details on using hashes here.

File details

Details for the file static_ondisk_kv-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for static_ondisk_kv-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a225348cd5f4933920db0a48272bbc8398d90a5d926da3db74c801449d97857
MD5 20019677dfcc9e6dc5fdaa0cf88440e0
BLAKE2b-256 85c98a9ca0bd2072c17db819a6784ac7b21340cc83df2e2d7175d6057b8c7ffd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page