Skip to main content

A dictionary-like object that is friendly with multiprocessing and uses key-value databases (e.g., RocksDB) as the underlying storage.

Project description

hugedict PyPI Documentation

hugedict provides a drop-in replacement for dictionary objects that are too big to fit in memory. hugedict's dictionary-like objects implement typing.Mapping and typing.MutableMapping interfaces using key-value databases (e.g., RocksDB) as the underlying storage. Moreover, they are friendly with Python's multiprocessing.

Installation

From PyPI (using pre-built binaries):

pip install hugedict

To compile the source, run: maturin build -r inside the project directory. You need Rust, Maturin, CMake and CLang (to build Rust-RocksDB).

Features

  1. Create a mutable mapping backed by RocksDB
from functools import partial
from hugedict.prelude import RocksDBDict, RocksDBOptions

# replace [str, str] for the types of keys and values you want
# as well as deser_key, deser_value, ser_value
mapping: MutableMapping[str, str] = RocksDBDict(
    path=dbpath,  # path (str) to db file
    options=RocksDBOptions(create_if_missing=create_if_missing),  # whether to create database if missing, check other options
    deser_key=partial(str, encoding="utf-8"),  # decode the key from memoryview
    deser_value=partial(str, encoding="utf-8"),  # decode the value from memoryview
    ser_value=str.encode,  # encode the value to bytes
    readonly=False,  # open database in read only mode
    secondary_mode=False,  # open database in secondary mode
    secondary_path=None,  # when secondary_mode is True, it's a string pointing to a directory for storing data required to operate in secondary mode
)
  1. Load huge data from files into RocksDB in parallel: from hugedict.prelude import rocksdb_load. This function creates SST files in parallel, ingests into the db and (optionally) compacts them.

  2. Cache a function when doing parallel processing

from hugedict.prelude import Parallel

pp = Parallel()

@pp.cache_func("/tmp/test.db")
def heavy_computing(seconds: float):
    time.sleep(seconds)
    return seconds * 2


output = pp.map(heavy_computing, [0.5, 1, 0.7, 0.3, 0.6], n_processes=3)
  1. Create dictionary backed by Sqlite: hugedict.sqlite.SqliteDict
  2. Chain multiple dictionaries: hugedict.chained_mapping.ChainedMapping
  3. Cache a dictionary so previously accessed keys are stored in memory: hugedict.cachedict.CacheDict or called hugedict.types.HugeMapping.cache

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugedict-2.12.10.tar.gz (15.4 MB view hashes)

Uploaded Source

Built Distributions

hugedict-2.12.10-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp312-cp312-manylinux_2_35_x86_64.whl (5.9 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.35+ x86-64

hugedict-2.12.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp312-cp312-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (8.2 MB view hashes)

Uploaded CPython 3.12 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

hugedict-2.12.10-cp311-cp311-manylinux_2_35_x86_64.whl (5.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.35+ x86-64

hugedict-2.12.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp311-cp311-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (8.2 MB view hashes)

Uploaded CPython 3.11 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

hugedict-2.12.10-cp310-cp310-manylinux_2_35_x86_64.whl (5.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

hugedict-2.12.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp310-cp310-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (8.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

hugedict-2.12.10-cp39-cp39-manylinux_2_35_x86_64.whl (5.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.35+ x86-64

hugedict-2.12.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp39-cp39-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (8.2 MB view hashes)

Uploaded CPython 3.9 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

hugedict-2.12.10-cp38-none-win_amd64.whl (3.3 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

hugedict-2.12.10-cp38-cp38-manylinux_2_35_x86_64.whl (5.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.35+ x86-64

hugedict-2.12.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.8 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

hugedict-2.12.10-cp38-cp38-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (8.2 MB view hashes)

Uploaded CPython 3.8 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page