Simple FastAPI-based hash server
Project description
Hashserver
A lightweight, content-addressed file server over HTTP.
Hashserver stores and serves opaque binary buffers keyed by their cryptographic checksum. You PUT a buffer with its checksum in the URL; you GET it back by the same checksum. There are no filenames, no directories, no metadata — just content and its hash.
The hash algorithm is configurable: SHA-256 (default) or SHA3-256.
Why content-addressed storage?
Content-addressed storage (CAS) is a well-established pattern used by Git, IPFS, Docker registries, and many other systems. Identifying data by its cryptographic hash gives you automatic deduplication, trivially verifiable integrity, and strong reproducibility guarantees.
Hashserver brings these benefits to any project that needs a simple HTTP-based buffer store. It is intentionally minimal: a single ASGI application backed by a directory of files, designed to be easy to deploy, easy to integrate, and easy to reason about.
Relationship to Seamless
Hashserver was originally developed as the buffer-serving component of Seamless, a framework for reproducible, reactive computational workflows. In Seamless, all data — inputs, source code, and results — is represented as a tree of checksums, and hashserver provides the storage layer that maps those checksums back to actual data.
However, hashserver has no dependency on Seamless and no knowledge of it. It is a generic content-addressed file server that is useful in any context where you need to store and retrieve buffers by hash — caching layers, artifact stores, reproducible pipelines, or your own CAS-backed application. It is published as an independent PyPI package for exactly this reason.
Features
- Content-addressed: buffers are stored and retrieved by their cryptographic checksum.
- Configurable hash algorithm: SHA-256 (default) or SHA3-256, selected at startup.
- Integrity-verified reads: every buffer is re-checksummed on GET to detect corruption.
- Prefix directory layout: by default, buffers are stored under a two-character prefix subdirectory (e.g.
ab/ab3f7c...) to avoid filesystem performance problems with large flat directories. A flat layout is also supported. - Extra read-only directories: additional buffer directories can be mounted as fallback read sources.
- Promises: a client can announce that a buffer will be uploaded soon via
PUT /promise/{checksum}. Other clients reading that checksum will wait for the upload rather than getting a 404. - Concurrent-safe: in-flight PUT requests are tracked so concurrent GETs and batch queries return consistent results. Lock files are respected for external writers.
- Multiple instances: several hashserver processes can safely share the same buffer directory.
- Lightweight: built on FastAPI/Starlette — no database, no external services.
- Flexible deployment: run as a CLI tool, under any ASGI server, or via Docker Compose.
Installation
pip install hashserver
Or with conda:
mamba env create --file environment.yml
conda activate hashserver
Quick start
Serve buffers from a local directory:
hashserver ./my-buffers
This starts the server under uvicorn on port 8000. Run hashserver -h for all options.
Storing and retrieving a buffer
# Start a writable server
hashserver ./my-buffers --writable
# Compute the SHA-256 checksum and upload
CHECKSUM=$(python3 -c "
import hashlib, sys
print(hashlib.sha256(open(sys.argv[1],'rb').read()).hexdigest())
" myfile.bin)
curl -X PUT --data-binary @myfile.bin http://localhost:8000/$CHECKSUM
# Download
curl -O http://localhost:8000/$CHECKSUM
To use SHA3-256 instead, start the server with --hash-algorithm sha3-256 and hash your files with hashlib.sha3_256.
API
Retrieving buffers
GET /{checksum} — Retrieve a buffer by its hex checksum. The server verifies the checksum before sending the response. Returns the raw buffer (200), or 404 if not found.
Storing buffers
Requires --writable.
PUT /{checksum} — Upload a buffer. The request body is the raw data; the server verifies that its checksum matches the URL. Returns 200 on success, 201 if the buffer already existed, or 400 on checksum mismatch.
PUT /promise/{checksum} — Announce that a buffer will be uploaded soon. Returns 202 with the promise TTL. While a promise is active, GET requests for that checksum will wait rather than returning 404, and /has queries will report the checksum as present.
Querying availability
GET /has — Batch existence check. Send a JSON list of checksums in the request body. Returns a JSON list of booleans. Includes both on-disk buffers and active promises.
GET /has-now — Same as /has, but excludes promises — only reports buffers that are already on disk.
GET /buffer-length — Batch size query. Send a JSON list of checksums in the request body. Returns a JSON list of integers: the buffer size in bytes, or 0 if not present. Promised checksums are reported as true.
Health
GET /healthcheck — Returns "OK". Useful for load balancer probes.
Configuration
CLI flags
| Flag | Description | Default |
|---|---|---|
directory |
Buffer storage directory (positional, required) | — |
--writable |
Enable PUT endpoints | off |
--hash-algorithm |
Hash algorithm: sha3-256 or sha-256 |
sha-256 |
--layout |
Directory layout: prefix or flat |
prefix |
--extra-dirs |
Semicolon-separated list of extra read-only buffer directories | — |
--host |
Listen address | 127.0.0.1 |
--port |
Listen port | 8000 |
--port-range START END |
Pick a random free port in range (mutually exclusive with --port) |
— |
--status-file |
JSON file for reporting server status | — |
--timeout |
Shut down after this many seconds of inactivity | — |
Environment variables
When running under an external ASGI server (e.g. uvicorn hashserver:app), configure via environment variables instead:
| Variable | Equivalent flag |
|---|---|
HASHSERVER_DIRECTORY |
directory |
HASHSERVER_WRITABLE |
--writable (set to 1 or true) |
HASHSERVER_HASH_ALGORITHM |
--hash-algorithm |
HASHSERVER_LAYOUT |
--layout |
HASHSERVER_EXTRA_DIRS |
--extra-dirs |
Docker Compose
export HASHSERVER_PORT=8000
export HASHSERVER_HOST=0.0.0.0
export HASHSERVER_DIRECTORY=./buffers
export HASHSERVER_WRITABLE=1
docker compose up -d
Container user/group ID can be set with HASHSERVER_USER_ID and HASHSERVER_GROUP_ID (both default to 0).
Directory layouts
In prefix layout (the default), a buffer with checksum ab3f7c... is stored as <directory>/ab/ab3f7c.... A sentinel file .HASHSERVER_PREFIX is written to the directory. This avoids performance issues when storing large numbers of buffers.
In flat layout, the same buffer is stored as <directory>/ab3f7c....
Extra directories auto-detect their layout by checking for the .HASHSERVER_PREFIX sentinel.
Running tests
pip install requests
pytest tests/
License
See LICENSE.txt.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hashserver-1.0.tar.gz.
File metadata
- Download URL: hashserver-1.0.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
297f1a6839953df060f6f52b8012aaa672cc82642637bb0b34f69548cd66cdcb
|
|
| MD5 |
4e042a328b53b439e9711dec2c5a4326
|
|
| BLAKE2b-256 |
f191ef4bdcacd27ad52b41d29f7c383b6145b39765265586202335afbfe0a538
|
File details
Details for the file hashserver-1.0-py3-none-any.whl.
File metadata
- Download URL: hashserver-1.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef9605c53cc38d9a1f52968b54994d1d3346f9162ca2b4347b4c4a812fea351f
|
|
| MD5 |
589ac879217ee17ce9ccf25487f07e28
|
|
| BLAKE2b-256 |
1bb45d36df998dfbd65b256019f7bb214ce8d198189b51b63df525fd9cfe2af4
|