Skip to main content

No project description provided

Project description

Stash: stable hash and object stash

Stash assigns a stable hash to arbitrary Python objects, mutable or immutable, based on its state at the time of hashing.

>>> import stash
>>> d = [{1: 2}, 3]
>>> h = stash.hash(d)
>>> h.hex()
'acb6b358dde6ee740b18dff8232cce8f'

Why not use Python's built-in hash?

Python's hash differs from the stash hash in three important ways: 1. it is supported only by a limited class of immutable objects; 2. it is not stable between restarts, i.e. after restarting Python the same object may be assigned a different hash, and 3. Python promises only that an equal objects have equal hashes. Stash also guarantees the converse: equal hashes imply some notion of object equality.

Don't make promises you cannot keep.

You're referring to the fact that since we are mapping an infinite space of potential objects to a finite space of hashes, there are bound to be collisions. This is true. But by keeping track of previously seen objects we can guarantee uniqueness within that set. If a collision happens we raise an exception rather than return a colliding hash.

When a collision happens.

This is the question. By using a 128 bit hash function with good distribution properties (we use cityhash) the chance of a collision occuring is exceedingly small. To quantify this: at 128 bits it takes an input set of 18 quintillion (2^64) objects for the expected number of collisions to reach 1. This makes it permissible to make collisions an unrecoverable error in most applications.

What is the main use case?

Caching. If the output of a function is determined entirely by its arguments, then it may be worthwhile to hold on to this value in case the function is called with the same set of arguments later. However, this means having to make potentially expensive deep comparisons to all previously seen arguments every time we call the function. Worse, it also means having to make deep copies of all the arguments to protect against future external mutations. All of this is solved by making a hash of the arguments, and comparing it against earlier hashes, which is precicely what stash provides.

How does it work?

In short, stash serializes an object to bytes and hashes the serialization.

Wait, can't we just hash a pickle stream then?

Well, yes. But pickle stores more than what you are likely interested in, such as the insertion order of dictionaries, so that {'a': 1, 'b': 2} and {'b': 2, 'a': 1} would end up receiving different hashes resulting in a cache miss. Likewise, objects that contain multiple references to an object receive a different hash than one references multiple copies. Stash loosely follows Python's equality operator to decide which objects are assigned a unique hash.

Loosely?

There is a fundamental problem with objects that do not test equal to themselves, such as float('nan'): since the assigned hash is equal to itself, we cannot identify object equality with hash equality. It is also not possible to honour user defined __eq__ methods, so we go by state instead. Lastly there is an issue with True, 1 and 1.0 all testing equal. This one is not fundamental, as we could very well assign all these objects the same hash, but it adds some overhead, to no clear benefit as it is not at all given that functions treat these objects the same. So here we make the pragmatic choice of not doing the extra work.

Can you say a bit more about how this works internally?

Stash works by recursively reducing an object and stashing the components, which directly explains how common values are deduplicated: stashing the same object twice simply returns a reference to an existing hash entry. The resulting collection of hashes is bundled and hashed to form the hash of the object, Merkle tree-style. A detailed overview of the protocol can be found here.

Reducing objects recursively sounds slow. Is it slow?

Stash is implemented in rust for minimum overhead. It also keeps track of object ids seen before during serialization, to avoid recursing into the same object several times over. Stash is faster than pickle without collision checks, or roughly half as fast with in-memory collision checks.

This all sounds great. Can I use it yet?

Better not. The project is under active development and the protocol not finalized, so none of the stability guarantees are worth much yet. Hopefully soon though! Watch this space for releases to stay up to date.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stash-0.3.1.tar.gz (18.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

stash-0.3.1-cp38-abi3-win_amd64.whl (209.0 kB view details)

Uploaded CPython 3.8+Windows x86-64

stash-0.3.1-cp38-abi3-win32.whl (201.0 kB view details)

Uploaded CPython 3.8+Windows x86

stash-0.3.1-cp38-abi3-musllinux_1_1_x86_64.whl (521.8 kB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

stash-0.3.1-cp38-abi3-musllinux_1_1_aarch64.whl (529.0 kB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

stash-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (352.0 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

stash-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (346.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

stash-0.3.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (626.4 kB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file stash-0.3.1.tar.gz.

File metadata

  • Download URL: stash-0.3.1.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for stash-0.3.1.tar.gz
Algorithm Hash digest
SHA256 10f3a81b058d5b8547ad523ccaff45738e4563d736e817267211dbae8e66e173
MD5 c54f451889e03d592693f47fc1aa1e95
BLAKE2b-256 ec3434a6a455bfafb42aadb605f46b3ad8c30e50591fd80f3c19745edd67a194

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: stash-0.3.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 209.0 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for stash-0.3.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ffc7946092557023ca554673f6f7c03d99293451998ee705c1df94f5d2cca809
MD5 dc2619b83fab428ccc58451e385afe92
BLAKE2b-256 4143819e077dc3d1b9c392d873c7c405753668e969d300aad8b1052faaf24b01

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-win32.whl.

File metadata

  • Download URL: stash-0.3.1-cp38-abi3-win32.whl
  • Upload date:
  • Size: 201.0 kB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for stash-0.3.1-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 6556ee273101e99e121bd0e130312b8d7133d159288a36c055cfe78bcd09a97d
MD5 b10f7a9facd4a6fae7a081b06f59a07b
BLAKE2b-256 bb1db04909f43727917c9445d70066ada20e668e795165f410ed714d1ddd6dbd

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for stash-0.3.1-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 8cd7433853328fba3cd90709790db4c9cf7517a08cf67ed53021918f8699f4e0
MD5 3b07c59f804ddb5c7ce50596a5b93153
BLAKE2b-256 ce567d1735b0a11db1cae9481d4102394966bff7a3376633c3330e8d19c16d21

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for stash-0.3.1-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 73d4f0198c695d21a18216229c3e5bd622c1c4f68ad5066cf9ad00b80c37a58a
MD5 467ff0cb95d0bc3f9eee95e493d8dc53
BLAKE2b-256 9530a6a8083ba95cb4a2648eac9952dbeb7e9927bd684fa5ca12336a991783bd

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for stash-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6742c99b304f7e5a4f1f595b93fcf3b71bc27d932364954b825ec5a267480727
MD5 74d907b98319625379e2a85691b2e7a2
BLAKE2b-256 e87dc7c06d4ee598c4d8909e3553f5bc1b8ff15410f65f7589cca1c2d50dbf14

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for stash-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 83334bad60dd17a7f9b8bba76296f47a1aa381f85aa1586449320687686d8c42
MD5 134efd0e8c21f66404bde341de28c7b1
BLAKE2b-256 fbeb884ea1150e1785e46ff95914243ae1ae599b12c03b3254f0113ae46541f5

See more details on using hashes here.

File details

Details for the file stash-0.3.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for stash-0.3.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 98a7fb611c3ea68e6735282043c61a518a4e4be1db295c4241fb5571143e1803
MD5 a9487f080f3e6754aa94b46fd4d62ca8
BLAKE2b-256 e94b6c84fd27bfc1810b681febc6cbeb26f2846166e14896e7f41349446a8fd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page