No project description provided
Project description
Stash: stable hash and object stash
Stash assigns a stable hash to arbitrary Python objects, mutable or immutable, based on its state at the time of hashing.
>>> import stash
>>> d = [{1: 2}, 3]
>>> h = stash.hash(d)
>>> h.hex()
'acb6b358dde6ee740b18dff8232cce8f'
Why not use Python's built-in hash?
Python's hash differs from the stash hash in three important ways: 1. it is supported only by a limited class of immutable objects; 2. it is not stable between restarts, i.e. after restarting Python the same object may be assigned a different hash, and 3. Python promises only that an equal objects have equal hashes. Stash also guarantees the converse: equal hashes imply some notion of object equality.
Don't make promises you cannot keep.
You're referring to the fact that since we are mapping an infinite space of potential objects to a finite space of hashes, there are bound to be collisions. This is true. But by keeping track of previously seen objects we can guarantee uniqueness within that set. If a collision happens we raise an exception rather than return a colliding hash.
When a collision happens.
This is the question. By using a 128 bit hash function with good distribution properties (we use cityhash) the chance of a collision occuring is exceedingly small. To quantify this: at 128 bits it takes an input set of 18 quintillion (2^64) objects for the expected number of collisions to reach 1. This makes it permissible to make collisions an unrecoverable error in most applications.
What is the main use case?
Caching. If the output of a function is determined entirely by its arguments, then it may be worthwhile to hold on to this value in case the function is called with the same set of arguments later. However, this means having to make potentially expensive deep comparisons to all previously seen arguments every time we call the function. Worse, it also means having to make deep copies of all the arguments to protect against future external mutations. All of this is solved by making a hash of the arguments, and comparing it against earlier hashes, which is precicely what stash provides.
How does it work?
In short, stash serializes an object to bytes and hashes the serialization.
Wait, can't we just hash a pickle stream then?
Well, yes. But pickle stores more than what you are likely interested in, such
as the insertion order of dictionaries, so that {'a': 1, 'b': 2} and {'b': 2, 'a': 1} would end up receiving different hashes resulting in a cache miss.
Likewise, objects that contain multiple references to an object receive a
different hash than one references multiple copies. Stash loosely follows
Python's equality operator to decide which objects are assigned a unique hash.
Loosely?
There is a fundamental problem with objects that do not test equal to
themselves, such as float('nan'): since the assigned hash is equal to itself,
we cannot identify object equality with hash equality. It is also not possible
to honour user defined __eq__ methods, so we go by
state
instead. Lastly there is an issue with True, 1 and 1.0 all testing equal.
This one is not fundamental, as we could very well assign all these objects the
same hash, but it adds some overhead, to no clear benefit as it is not at all
given that functions treat these objects the same. So here we make the
pragmatic choice of not doing the extra work.
Can you say a bit more about how this works internally?
Stash works by recursively reducing an object and stashing the components, which directly explains how common values are deduplicated: stashing the same object twice simply returns a reference to an existing hash entry. The resulting collection of hashes is bundled and hashed to form the hash of the object, Merkle tree-style. A detailed overview of the protocol can be found here.
Reducing objects recursively sounds slow. Is it slow?
Stash is implemented in rust for minimum overhead. It also keeps track of object ids seen before during serialization, to avoid recursing into the same object several times over. Stash is faster than pickle without collision checks, or roughly half as fast with in-memory collision checks.
This all sounds great. Can I use it yet?
Better not. The project is under active development and the protocol not finalized, so none of the stability guarantees are worth much yet. Hopefully soon though! Watch this space for releases to stay up to date.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stash-0.3.1.tar.gz.
File metadata
- Download URL: stash-0.3.1.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10f3a81b058d5b8547ad523ccaff45738e4563d736e817267211dbae8e66e173
|
|
| MD5 |
c54f451889e03d592693f47fc1aa1e95
|
|
| BLAKE2b-256 |
ec3434a6a455bfafb42aadb605f46b3ad8c30e50591fd80f3c19745edd67a194
|
File details
Details for the file stash-0.3.1-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 209.0 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffc7946092557023ca554673f6f7c03d99293451998ee705c1df94f5d2cca809
|
|
| MD5 |
dc2619b83fab428ccc58451e385afe92
|
|
| BLAKE2b-256 |
4143819e077dc3d1b9c392d873c7c405753668e969d300aad8b1052faaf24b01
|
File details
Details for the file stash-0.3.1-cp38-abi3-win32.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-win32.whl
- Upload date:
- Size: 201.0 kB
- Tags: CPython 3.8+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6556ee273101e99e121bd0e130312b8d7133d159288a36c055cfe78bcd09a97d
|
|
| MD5 |
b10f7a9facd4a6fae7a081b06f59a07b
|
|
| BLAKE2b-256 |
bb1db04909f43727917c9445d70066ada20e668e795165f410ed714d1ddd6dbd
|
File details
Details for the file stash-0.3.1-cp38-abi3-musllinux_1_1_x86_64.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 521.8 kB
- Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cd7433853328fba3cd90709790db4c9cf7517a08cf67ed53021918f8699f4e0
|
|
| MD5 |
3b07c59f804ddb5c7ce50596a5b93153
|
|
| BLAKE2b-256 |
ce567d1735b0a11db1cae9481d4102394966bff7a3376633c3330e8d19c16d21
|
File details
Details for the file stash-0.3.1-cp38-abi3-musllinux_1_1_aarch64.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 529.0 kB
- Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73d4f0198c695d21a18216229c3e5bd622c1c4f68ad5066cf9ad00b80c37a58a
|
|
| MD5 |
467ff0cb95d0bc3f9eee95e493d8dc53
|
|
| BLAKE2b-256 |
9530a6a8083ba95cb4a2648eac9952dbeb7e9927bd684fa5ca12336a991783bd
|
File details
Details for the file stash-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 352.0 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6742c99b304f7e5a4f1f595b93fcf3b71bc27d932364954b825ec5a267480727
|
|
| MD5 |
74d907b98319625379e2a85691b2e7a2
|
|
| BLAKE2b-256 |
e87dc7c06d4ee598c4d8909e3553f5bc1b8ff15410f65f7589cca1c2d50dbf14
|
File details
Details for the file stash-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 346.3 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83334bad60dd17a7f9b8bba76296f47a1aa381f85aa1586449320687686d8c42
|
|
| MD5 |
134efd0e8c21f66404bde341de28c7b1
|
|
| BLAKE2b-256 |
fbeb884ea1150e1785e46ff95914243ae1ae599b12c03b3254f0113ae46541f5
|
File details
Details for the file stash-0.3.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: stash-0.3.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 626.4 kB
- Tags: CPython 3.8+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98a7fb611c3ea68e6735282043c61a518a4e4be1db295c4241fb5571143e1803
|
|
| MD5 |
a9487f080f3e6754aa94b46fd4d62ca8
|
|
| BLAKE2b-256 |
e94b6c84fd27bfc1810b681febc6cbeb26f2846166e14896e7f41349446a8fd6
|