Skip to main content

Provides Python hashers (MD5, SHA-1, SHA-2, ssdeep) with serializable internal state that can be persisted and recovered to continue hashing in another context.

Project description

rehashes

Provides Python hashers with serializable internal state that can be persisted and recovered to continue hashing in another context.

Supported algorithms:

  • MD5 (rehashes.PyMd5)
  • SHA1 (rehashes.PySha1)
  • SHA256 (rehashes.PySha256)
  • SHA512 (rehashes.PySha512)
  • ssdeep ((rehashes.PySsdeep, basic fuzzy hash evaluation only)

All hashers share the same minimal interface (update/finalize/serialize/deserialize).

This library is powered by Rust and hash implementations are provided by:

Motivation

Unlike hashlib, rehashes hashers support state serialization, allowing you to persist and resume hashing across processes or sessions. Standard hashlib objects cannot be pickled or serialized.

Existing libraries are trying to achieve that by serializing OpenSSL opaque structures, which is very unsafe because the internal structures of OpenSSL are not stable and may vary across versions and platforms. rehashes is a Rust wrapper around rustcrypto/hashes that natively support serialization.

The intended use case of rehashes is to support chunked upload in MWDB Core and Drakvuf Sandbox projects. The idea is to be able to stream uploaded chunks to S3 storage and compute hashes of the whole file without the need to re-read it afterward. This is achieved by serializing the internal state of the hasher and storing it in the shared database (e.g. Redis). Then for each chunk, we can recover the state and update/finalize the hash computation.

As rehashes was made for use in MWDB Core, it supports ssdeep (libfuzzy) computation by embedding the fuzzyhash-rs implementation, that was slightly modified to support serialization.

Installation

pip install rehashes

Pre-built wheels are available for:

  • Linux x86_64 and aarch64 (manylinux2014 / glibc 2.17+)
  • Python 3.10+ (abi3 stable ABI — one wheel covers all Python versions)

Usage

Basic hashing

from rehashes import PySha256

hasher = PySha256()
hasher.update(b"Hello, ")
hasher.update(b"World!")
print(hasher.finalize())
# "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"

All hashers follow the same interface:

from rehashes import PyMd5, PySha1, PySha256, PySha512, PySsdeep

hasher = PySsdeep()        # or PySha1(), PySha256(), PySha512(), PySsdeep()
hasher.update(data)     # Feed data (bytes) into the hasher
result = hasher.finalize()  # Get the hex digest as a string

Serializable state

Warning! Ensure that you're using the same version of library for serializing and deserializing state. Serialized state should be kept server-side and not be exposed e.g. in JWT tokens. Under the hood we use:

The key feature of rehashes is the ability to serialize and restore the internal hasher state. This enables use cases like chunked file uploads where you need to compute hashes across multiple sessions without re-reading the entire file.

from rehashes import PySha256

# Session 1: Process first chunk of data
hasher = PySha256()
hasher.update(b"chunk 1 data")

# Serialize and persist the state (e.g., to Redis, database, etc.)
state = hasher.serialize()

# Session 2: Recover state and continue hashing
hasher = PySha256.deserialize(state)
hasher.update(b"chunk 2 data")

# Finalize when all data has been processed
print(hasher.finalize())

This works for all supported algorithms including ssdeep:

from rehashes import PySsdeep

hasher = PySsdeep()
hasher.update(b"chunk 1 data")
state = hasher.serialize()  # Persist state to shared storage

# ... later, in another process ...
hasher = PySsdeep.deserialize(state)
hasher.update(b"chunk 2 data")
ssdeep_hash = hasher.finalize()

API Reference

Each hasher class (PyMd5, PySha1, PySha256, PySha512, PySsdeep) exposes:

Method Description
__init__() Create a new hasher instance
update(data: bytes) Feed data into the hasher
finalize() -> str Return the hash digest as a hex string
serialize() -> bytes Serialize the internal state to bytes
deserialize(data: bytes) -> Self Restore a hasher from serialized state (staticmethod)

Development

Setup

# Create virtual environment and install maturin
pip install maturin

# Build and install in development mode
maturin develop

# Run tests
pip install pytest
pytest tests/

Building wheels

# Build wheel for current platform
maturin build --release

# Build manylinux wheel
maturin build --release --manylinux manylinux2014 -i python3.10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rehashes-0.2.0.tar.gz (22.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rehashes-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl (575.2 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

rehashes-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl (540.4 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

rehashes-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (369.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

rehashes-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (365.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

File details

Details for the file rehashes-0.2.0.tar.gz.

File metadata

  • Download URL: rehashes-0.2.0.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for rehashes-0.2.0.tar.gz
Algorithm Hash digest
SHA256 86bac2298f8d8d8c62613e3d635e0e1d19fa41bfcc513ea87d72a6f6127fef1f
MD5 c85bad15ed9cdaeb8289797aeff1d515
BLAKE2b-256 67b75080c5ff56ba754b8f7ae33e5dd44ce82cd4c8574ba1b83b999b52743702

See more details on using hashes here.

File details

Details for the file rehashes-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rehashes-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 bed1d4681abc4875eb5a526cb07fca44bb5f0d44e7fb6086fc8b1425852bb698
MD5 3d922f3421b8ffeadb4dec98fcfa5e56
BLAKE2b-256 f0cec5d53121ecf086738794e6a6911c8a341892f8e798e5d7a0d66ce70065fe

See more details on using hashes here.

File details

Details for the file rehashes-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rehashes-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 4ef57b6cdf9a1393d95c1e503b342825669507c9135b56238eedc470d4cd6e6d
MD5 0096a56ae60a906a4530710d7240e765
BLAKE2b-256 e8e002739be2eeec7e75890c79495b7ab1285d0e5657df0fc03e99404e667fbe

See more details on using hashes here.

File details

Details for the file rehashes-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rehashes-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 34470ade94af917bcdaa109c5c2ae00bc8ae6e6c87ecd6da9e0da1f3aff3251b
MD5 6105424b6098d8b64f9250adbc90fda3
BLAKE2b-256 f323336bd7917bf6729e992d4d0038b0817454c4230181b6db757a70a24b88de

See more details on using hashes here.

File details

Details for the file rehashes-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rehashes-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 56580480f9ad20733e1c9cf6cb8187c3e2b32af88873d6627309846ac177124d
MD5 bb3604dd4a74f0c0bde533b99e40a449
BLAKE2b-256 682d9eb77cfd7312eb0e02f45105770579aba1cb02f3aac9cce1494cd4452c69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page