Skip to main content

Storage and retrieval of object-derived, decomposable recursive unique identifiers.

Project description

Henge

Henge is a Python package for building data storage and retrieval interfaces for arbitrary data. Henge is based on the idea of decomposable recursive unique identifiers (DRUIDs), which are hash-based unique identifiers for data derived from the data itself. For arbitrary data with any structure, Henge can mint unique DRUIDs to identify data, store the data in a key-value database of your choice, and provide lookup functions to retrieve the data in its original structure using its DRUID identifier.

Henge was intended as a building block for sequence collections, but is generic enough to use for any data type that needs content-derived identifiers with database lookup capability.

Install

pip install henge

Quick Start

Create a Henge object by providing a database and a data schema. The database can be a Python dict or backed by persistent storage. Data schemas are JSON-schema descriptions of data types, and can be hierarchical.

import henge

schemas = ["path/to/json_schema.yaml"]
h = henge.Henge(database={}, schemas=schemas)

Insert items into the henge. Upon insert, henge returns the DRUID (digest/checksum/unique identifier) for your object:

druid = h.insert({"name": "Pat", "age": 38}, item_type="person")

Retrieve the original object using the DRUID:

h.retrieve(druid)
# {'age': '38', 'name': 'Pat'}

Tutorial

For a comprehensive walkthrough covering basic types, arrays, nested objects, and advanced features, see the tutorial notebook.

What are DRUIDs?

DRUIDs are a special type of unique identifiers with two powerful properties:

  • Decomposable: Identifiers in henge automatically retrieve structured data (tuples, arrays, objects). The structure is defined by a JSON schema, so henge can be used as a back-end for arbitrary data types.

  • Recursive: Individual elements retrieved by henge can be tagged as recursive, meaning these attributes contain their own DRUIDs. Henge can recurse through these, allowing you to mint unique identifiers for arbitrary nested data structures.

A DRUID is ultimately the result of a digest operation (such as md5 or sha256) on some data. Because DRUIDs are computed deterministically from the item, they represent globally unique identifiers. If you insert the same item repeatedly, it will produce the same DRUID -- this is true across henges as long as they share a data schema.

Persisting Data

In-memory (default)

Use a Python dict as the database for testing or ephemeral use:

h = henge.Henge(database={}, schemas=schemas)

SQLite backend

For persistent storage with SQLite:

from sqlitedict import SqliteDict

mydict = SqliteDict('./my_db.sqlite', autocommit=True)
h = henge.Henge(mydict, schemas=schemas)

Requires: pip install sqlitedict

MongoDB backend

For production use with MongoDB:

  1. Start MongoDB with Docker:
docker run --network="host" mongo

For persistent storage, mount a volume to /data/db:

docker run -it --network="host" -v /path/to/data:/data/db mongo
  1. Connect henge to MongoDB:
import henge

h = henge.Henge(henge.connect_mongo(), schemas=schemas)

Requires: pip install pymongo mongodict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

henge-0.3.0.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

henge-0.3.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file henge-0.3.0.tar.gz.

File metadata

  • Download URL: henge-0.3.0.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for henge-0.3.0.tar.gz
Algorithm Hash digest
SHA256 727071a7d4dd240a60361cb999cb7933243cceb6cc979db267557001f98b879f
MD5 a20b5456b174fdcae6b5a9a464fed231
BLAKE2b-256 6a4796679d88d838daac1f25c5f1b63b3b2b9309cb1f4e28563ca70d157f3da6

See more details on using hashes here.

Provenance

The following attestation bundles were made for henge-0.3.0.tar.gz:

Publisher: python-publish.yml on databio/henge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file henge-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: henge-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for henge-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 736348d5df6f4ca475c929dcb97279a94eb4c0427cdd58e22415592573a6ebcd
MD5 f8c2f52ff66a4f37c8c7dd19e86e0c67
BLAKE2b-256 0da74d13acff9b51da56588e1c8eaf7054f3f21ed71ed466620d37d71d81e829

See more details on using hashes here.

Provenance

The following attestation bundles were made for henge-0.3.0-py3-none-any.whl:

Publisher: python-publish.yml on databio/henge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page