Cryptographic Input State Verification for Scientific Computing
Project description
ReproHash - Cryptographic Input State Verification
What ReproHash Does
Creates cryptographic snapshots of computational input states and enables verification without re-execution.
Verifies
- Input file integrity (SHA-256 hashes)
- Snapshot manifest consistency (mechanically enforced scope)
- Run record tamper-evidence (sealed provenance)
- Bundle coherence (component binding)
Does NOT Verify
- Numerical reproducibility (requires re-execution)
- Environment equivalence (minimal capture only)
- Execution correctness (not validated)
- Author authenticity (provides integrity, not authentication)
See docs/WHAT_THIS_IS_NOT.md for complete boundaries.
Quick Start
Installation
pip install reprohash-core
Basic Usage
# Create snapshot of input data
reprohash snapshot data/ -o snapshot.json
# Verify snapshot against data
reprohash verify snapshot.json -d data/
# Output: PASS_INPUT_INTEGRITY, FAIL, or INCONCLUSIVE
Complete Workflow
# 1. Snapshot inputs
reprohash snapshot input_data/ -o input_snapshot.json
# 2. Run your computation
python analysis.py
# 3. Create RunRecord (programmatically in your script)
# See examples/ for details
# 4. Snapshot outputs
reprohash snapshot output_data/ -o output_snapshot.json
# 5. Create complete bundle
reprohash create-bundle \
--input-snapshot input_snapshot.json \
--runrecord runrecord.json \
--output-snapshot output_snapshot.json \
-o bundle/
# 6. Verify complete bundle
reprohash verify-bundle bundle/ -d input_data/
Design Principles
Fail-Fast with Epistemic Precision
- PASS_INPUT_INTEGRITY: All checks passed
- FAIL: Integrity violated
- INCONCLUSIVE: Could not complete verification
Three outcomes distinguish "claim is false" from "claim not evaluated."
Mechanical Enforcement
Correctness enforced by code structure within reference implementation:
- Hash scope via
HashableManifestdataclass - Finalization via
RuntimeErrorbarriers - Sealing via cryptographic binding
Honest Boundaries
Explicitly states what it does NOT do:
- Not a workflow manager
- Not reproducibility guarantee
- Not authentication system
- Not defending against fraud
Governance
- perpetual free verification guarantee (OSS)
- Verification profile stability
- Documented versioning and compatibility
See docs/PHILOSOPHY.md and docs/GOVERNANCE.md
Guarantees and Constraints
What We Guarantee (Within Declared Constraints)
✅ Offline verification forever - No internet required
✅ Open source - Apache 2.0, zero dependencies
✅ Verification profile stability - Same profile = same semantics
✅ Service-independent - Verification never requires paid service
What We Cannot Guarantee
⚠️ Language-independent implementation - Python-specific semantics documented
⚠️ Semantic enforcement via profile parsing - Declarative profile, not executable
⚠️ Future-proof against all changes - Breaking changes require new profile
See docs/GOVERNANCE.md for complete versioning commitments.
For Reviewers
You don't need to pay anything.
ReproHash is fully functional as open source. CLI has no restrictions. Verification works offline, forever.
Quick Verification (< 5 minutes)
# Install once
pip install reprohash-core
# Download bundle from paper's Zenodo link
wget https://zenodo.org/record/XXX/bundle.zip
unzip bundle.zip
# Verify everything
reprohash verify-bundle bundle/ -d input_data/
# Output: PASS_INPUT_INTEGRITY, FAIL, or INCONCLUSIVE
See docs/FOR_REVIEWERS.md for detailed guidance.
Optional Paid Service
A paid service exists for convenience (Drive sync, team features).
The service is strictly optional and never required for verification.
Papers verified with ReproHash remain verifiable forever using only this free CLI, regardless of whether the service exists.
Architecture
Three Verification Layers
-
Component seals - Individual content hashes
- Snapshots:
content_hashover file manifest - RunRecords:
runrecord_hashover execution details
- Snapshots:
-
File integrity - Component file checksums
- Bundle manifest lists all component files
- SHA-256 hash for each file
-
Bundle binding - Complete artifact coherence
bundle_hashcryptographically binds components- Includes
verification_profilefor semantic stability
Verification Workflow
verify_bundle()
├── Bundle seal integrity (manifest not modified)
├── Component file integrity (JSON files match hashes)
├── Snapshot seal verification (content_hash valid)
├── RunRecord seal verification (runrecord_hash valid)
├── Provenance chain consistency (input → run → output)
└── Optional: Data verification (files match snapshot)
Complete semantic verification, not just coherence.
Documentation
Core Documentation
- PHILOSOPHY.md - Design principles and patterns
- GOVERNANCE.md - Versioning and compatibility
- LIMITATIONS.md - Honest limitations
- FOR_REVIEWERS.md - Reviewer guide
Specifications
- hash-scope-v1.yaml - Hash scope specification
- canonical-json-v1.yaml - Canonicalization rules
Boundaries
- WHAT_THIS_IS_NOT.md - Explicit non-goals
- PROVENANCE_SPEC.md - Linear chains only
Examples
Python API
from reprohash import (
create_snapshot,
RunRecord,
ZenodoBundle,
ReproducibilityClass
)
# Create input snapshot
input_snapshot = create_snapshot("data/")
print(f"Input hash: {input_snapshot.content_hash}")
# Create run record
runrecord = RunRecord(
input_snapshot.content_hash,
"python train.py --epochs 100",
ReproducibilityClass.DETERMINISTIC
)
runrecord.started = time.time()
# ... run computation ...
runrecord.ended = time.time()
runrecord.exit_code = 0
# Snapshot outputs
output_snapshot = create_snapshot("results/")
runrecord.bind_output(output_snapshot.content_hash)
# Seal runrecord (REQUIRED before archival)
runrecord.seal()
# Create bundle for publication
bundle = ZenodoBundle(input_snapshot, runrecord, output_snapshot)
bundle_hash = bundle.create_bundle("bundle/")
print(f"Bundle hash: {bundle_hash}")
Verification
from reprohash.bundle import verify_bundle
# Verify complete bundle
result = verify_bundle("bundle/", data_dir="data/")
print(f"Outcome: {result.outcome.value}")
# Outputs: PASS_INPUT_INTEGRITY, FAIL, or INCONCLUSIVE
if result.errors:
for err in result.errors:
print(f"Error: {err}")
if result.outcome.value == "PASS_INPUT_INTEGRITY":
print("✓ All integrity checks passed")
Development Setup
# Clone repository
git clone https://github.com/reprohash/reprohash-core.git
cd reprohash-core
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -v --cov=reprohash
# Run conformance tests
pytest tests/test_vectors/ -v
Contributing
- Read PHILOSOPHY.md and GOVERNANCE.md
- Check LIMITATIONS.md for scope
- Add tests for new features
- Maintain 95%+ coverage
- Follow existing code style
- Update documentation
Running Conformance Tests
# Verify implementation conforms to profile
python -m reprohash.conformance tests/test_vectors/v2.1/
Testing
Run the complete test suite:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests with coverage
pytest tests/ -v --cov=reprohash --cov-report=term-missing
# Expected: 27/28 tests passing, 95%+ coverage
Requirements
Runtime
- Python 3.8+
- Zero dependencies (stdlib only)
Development
- pytest (testing)
- pytest-cov (coverage)
- black (formatting)
- mypy (type checking)
Citation
@software{reprohash2025,
title = {ReproHash: Cryptographic Input State Verification},
author = {ReproHash Contributors},
year = {2025},
version = {2.1.8},
url = {https://github.com/reprohash/reprohash-core},
license = {Apache-2.0}
}
Paper submission to xxx (pending).
Contributing
We welcome contributions! Please:
- Read PHILOSOPHY.md for design principles
- Check GOVERNANCE.md for compatibility rules
- Add tests for new features
- Maintain 95%+ coverage
- Follow existing code style
Areas for Contribution
- Additional language implementations (following spec)
- Improved documentation
- Additional test cases
- Bug reports with minimal reproducible examples
Support
Questions
- Technical: opensource@reproledger.com
- Governance: governance@reproledger.com
- Reviewers: reviewers@reproledger.com
Resources
- Documentation: https://docs.reproledger.com
- Issues: https://github.com/reprohash/reprohash-core/issues
- Discussions: https://github.com/reprohash/reprohash-core/discussions
Reviewer support: <24h response time guarantee
License
Apache License 2.0
Copyright 2025 ReproHash Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Status
Version: 2.1.8
Status: Submission-ready within declared constraints
Tests: 27/28 passing (96%)
Coverage: 95%+
Dependencies: Zero (stdlib only)
Known Issues
None currently open.
Roadmap
- submission
- Public service beta
- Additional language implementations (Rust, Go)
- Formal specification (TLA+/Coq) - future work
Acknowledgments
ReproHash achieves something rare: satisfying both scientific integrity requirements AND commercial sustainability without compromise.
The key insight: The thing that journals rely on must never depend on your company existing.
By encoding this principle structurally, ReproHash will still be working—and papers will still be verifiable—in 2037.
Version: 2.1.8
License: Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reprohash_core-2.1.8.tar.gz.
File metadata
- Download URL: reprohash_core-2.1.8.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cad20baee8d2e022280a9d4c3faccf262434c4d163a9d8f28bd1fcc5aaace2eb
|
|
| MD5 |
f729eeef3b96db0c6136448b4603a211
|
|
| BLAKE2b-256 |
6c1d534b31a3f8b8274c8158c95373de10a2b39cfebe8a01eedb4b28c2500e23
|
File details
Details for the file reprohash_core-2.1.8-py3-none-any.whl.
File metadata
- Download URL: reprohash_core-2.1.8-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
378da7889a5eb45fb101271af5e45bc303459198c590274eed7ce2951f91873f
|
|
| MD5 |
53079e953c53e2fd4bca7a924da2d81f
|
|
| BLAKE2b-256 |
3adaa947b7ef7af6a5ad5b5cbafb56d5766f8e38827cc4c51ceb9f88166ecf0d
|