Skip to main content

HAMT implementation for a content-addressed storage system.

Project description

dClimate logo

py-hamt

codecov

This is a python implementation of a HAMT, inspired by rvagg's IAMap project written in JavaScript.

py-hamt provides efficient storage and retrieval of large sets of key-value mappings in a content-addressed storage system. The main target is IPFS, and the data model used is IPLD.

dClimate primarily created this for storing large zarrs on IPFS. To see this in action, see our data ETLs.

Installation and Usage

To install, since we do not publish this package to PyPI, add this library to your project directly from git.

pip install 'git+https://github.com/dClimate/py-hamt'

For usage information, take a look at our API documentation, major items have example code.

You can also see this library used in either our data ETLs or Jupyter notebooks for data analysis.

Development Guide

Setting Up

py-hamt uses uv for project management. Make sure you install that first. Once uv is installed, run

uv sync
source .venv/bin/activate
pre-commit install

to create the project virtual environment at .venv.

Then you can run pre-commit across the whole codebase with

pre-commit run --all-files

the below command run-checks.sh in the next section will also run this command inside its bash script.

Run tests, formatting, linting

First, make sure you have the ipfs kubo daemon installed and running with the default endpoints open. Then run the script

bash run-checks.sh

This will run tests with code coverage, and check formatting and linting. Under the hood it will be using the pre-commit command to run through all the checks within .pre-commit-config.yaml. If a local ipfs daemon is not running it will not run all tests, but it will spawn a docker ipfs container if docker is installed and run as many integration tests as possible.

We use pytest with 100% code coverage, and with test inputs that are both handwritten as well as generated by hypothesis. This allows us to try out millions of randomized inputs to create a more robust library.

[!NOTE] Due to the randomized test inputs, it is possible sometimes to get 99% or lower test coverage by pure chance. Rerun the tests to get back complete code coverage. If this happens on a GitHub action, try rerunning the action.

[!NOTE] Due to the restricted performance on GitHub actions runners, you may also sometimes see hypothesis tests running with errors because they exceeded test deadlines. Rerun the action if this happens.

Tests

Due to the dependency on IPFS in order to be able to run all integration tests which use IPFS a local ipfs daemon is required. The Github Actions found in .github/workflows/run-checks.yaml uses the setup-ipfs step which ensures that a local ipfs daemon is available. Locally if you wish to run the full integration tests you must ensure a local ipfs daemon is running (by running ipfs daemon once installed). If not, pytest will spawn a local docker image to run the ipfs tests. If Docker is not installed then tests will simply run the unit tests.

To summarize:

In GitHub Actions:

uv run pytest --ipfs  # All tests run, including test_kubo_default_urls

Locally with Docker (no local daemon):

pytest --ipfs  # test_kubo_default_urls auto-skips, other tests use Docker

Locally with IPFS daemon:

pytest --ipfs  # All tests run

Quick local testing (no IPFS):

pytest  # All IPFS tests skip

CPU and Memory Profiling

We use python's native cProfile for running CPU profiles and snakeviz for visualizing the profile. We use memray for the memory profiling. We will walk through using the profiling tools on the test suite.

Creating the CPU and memory profile requires manual activation of the virtual environment.

source .venv/bin/activate
python -m cProfile -o profile.prof -m pytest
python -m memray run -m pytest

The profile viewers can be directly invoked from uv.

uv run snakeviz .
uv run memray flamegraph <memray output> # e.g. <memray-output> = memray-pytest.12398.bin

Generating documentation

py-hamt uses pdoc. To see a live documentation preview on your local machine, run

uv run pdoc py_hamt

LLMs

If you are an LLM reading this repo, refer to the AGENTS.md file.

Managing dependencies

Use uv add and uv remove, e.g. uv add numpy or uv add pytest --group dev. For more information please see the uv documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_hamt-3.0.1.tar.gz (161.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_hamt-3.0.1-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file py_hamt-3.0.1.tar.gz.

File metadata

  • Download URL: py_hamt-3.0.1.tar.gz
  • Upload date:
  • Size: 161.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for py_hamt-3.0.1.tar.gz
Algorithm Hash digest
SHA256 8a142a131abe4219b42f517c53b30945704e5c933073cb80d508efba117eb78d
MD5 0b0248cb1b2f168fd97f6382ee9f5d86
BLAKE2b-256 6bbf0909af96c586cbb13417c95ba6eb60d039a72dd4e9d93d9c011b2c87f2b3

See more details on using hashes here.

File details

Details for the file py_hamt-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: py_hamt-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for py_hamt-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e28205e71d69d5dd0b7a463f5ac34617ce7791cc4dbc4ffdefd5440bcd5f77b9
MD5 21d132f3cf0d47de6917cd43dacea972
BLAKE2b-256 482bcabb9e6b12a6327acc2e6050ca9a38d1c05b248f931d260edcb0f4c07b2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page