High-performance ISCC Data-Code and Instance-Code hashing
Project description
iscc-sum
A blazing-fast ISCC Data-Code and Instance-Code hashing tool built in Rust with Python bindings. Delivers 50-130x faster performance than reference implementations, processing data at over 1 GB/s.
Originally created to handle massive microscopic imaging datasets where existing tools were too slow.
Project Status
Version 0.1.0 — Initial release for Data-Code and Instance-Code generation.
[!NOTE] By default, this tool creates ISCC-CODEs of SubType WIDE, introduced for large-scale secure checksum support with data similarity matching capabilities. This SubType is not yet part of the ISO 24138:2024 standard but is supported by the latest version of the Iscc-Core reference implementation. For ISO 24138:2024 conformant ISCC-CODEs, use the
--narrowflag in the CLI tool.
Performance
- 950-1050 MB/s processing speed (vs 7-8 MB/s reference)
- 50-130x faster than existing implementations
- Consistent performance on multi-GB files
Ideal for large-scale data processing: microscopic imaging, video files, scientific datasets.
Installation
Python Package
The recommended way to install the iscc-sum CLI tool is using uv:
uv tool install iscc-sum
Note: To install uv, run: curl -LsSf https://astral.sh/uv/install.sh | sh (or see
other installation methods)
Alternatively, install from PyPI:
pip install iscc-sum
Rust CLI Tool
Install from crates.io:
cargo install iscc-sum
Or download pre-built binaries from the releases page.
Usage
Command Line Interface
The iscc-sum command provides checksum generation and verification functionality similar to standard tools
like md5sum or sha256sum, but using ISCC (International Standard Content Code) checksums.
Basic Usage
# Generate checksum for a file
iscc-sum document.pdf
# Output: ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *document.pdf
# Generate checksums for multiple files
iscc-sum *.txt
# Read from standard input
echo "Hello, World!" | iscc-sum
cat document.txt | iscc-sum
Checksum Verification
# Create a checksum file
iscc-sum *.txt > checksums.txt
# Verify checksums
iscc-sum -c checksums.txt
# Output:
# file1.txt: OK
# file2.txt: OK
# Verify with quiet mode (only show failures)
iscc-sum -c -q checksums.txt
Output Formats
# Default format (GNU style)
iscc-sum file.txt
# ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file.txt
# BSD-style format
iscc-sum --tag file.txt
# ISCC (file.txt) = ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY
# Narrow format (128-bit)
iscc-sum --narrow file.txt
# ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HU *file.txt
# Show component codes
iscc-sum --units file.txt
# ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file.txt
# ISCC:EAAW4BQTJSTJSHAI27AJSAGMGHNUKSKRTK3E6OZ5CXUS57SWQZXJQ
# ISCC:IABXF3ZHYL6O6PM5P2HGV677CS3RBHINZSXEJCITE3WNOTQ2CYXRA
# Process entire directory as single unit
iscc-sum --tree /path/to/project
# ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY */path/to/project/
Similarity Matching
Find files with similar content:
# Find similar files (default threshold: 12 bits)
iscc-sum --similar *.jpg
# Output:
# photo1.jpg
# ~8 photo2.jpg
# ~12 photo3.jpg
# Adjust similarity threshold
iscc-sum --similar --threshold 6 *.pdf
Complete Options
iscc-sum --help # Show all available options
Options:
-c, --check Read checksums from files and check them
--narrow Generate shorter 128-bit checksums
--tag Create a BSD-style checksum
--units Show Data-Code and Instance-Code components
-z, --zero End each output line with NUL
--similar Find files with similar Data-Codes
--threshold Hamming distance threshold for similarity (default: 12)
-t, --tree Process directory as single unit with combined checksum
-q, --quiet Don't print OK for each verified file
--status Don't output anything, exit code shows success
-w, --warn Warn about improperly formatted lines
--strict Exit non-zero for improperly formatted lines
Examples
See the examples directory for practical scripts demonstrating:
- Backup verification workflows
- Duplicate file detection
- File integrity monitoring
- Download verification
Rust CLI Tool
A standalone Rust binary is also available:
# Install from crates.io
cargo install iscc-sum
# Run the Rust CLI
isum
Python API
Quick Start
Generate ISCC-SUM codes for files:
>>> from iscc_sum import code_iscc_sum
>>>
>>> # Generate extended ISCC-SUM for a file
>>> result = code_iscc_sum("LICENSE", wide=True)
>>> result.iscc
'ISCC:K4AA2G6UMXGFJAO6ZOMIFZIYO6LYMOBT7Q6JDI3Z75IJWQY5WH372QA'
>>> result.datahash
'1e203833fc3c91a379ff509b431db1f7fd40dea69a6614249f420ec62398957087b1'
>>> result.filesize
11357
Streaming API
For large files or streaming data, use the processor classes:
from iscc_sum import IsccSumProcessor
processor = IsccSumProcessor()
with open("large_file.bin", "rb") as f:
while chunk := f.read(1024 * 1024): # Read in 1MB chunks
processor.update(chunk)
result = processor.result(wide=False, add_units=True)
print(f"ISCC: {result.iscc}")
print(f"Units: {result.units}") # Individual Data-Code and Instance-Code
Development
Prerequisites
- Rust (latest stable) - Install from rustup.rs
- Python 3.10+
- UV (for Python dependency management) - Install from astral.sh/uv
Quick Setup
# Clone the repository
git clone https://github.com/bio-codes/iscc-sum.git
cd iscc-sum
# Install Python dependencies
uv sync --all-extras
# Setup Rust development components
uv run poe setup
# Build Python extension and run all checks
uv run poe all
Development Commands
All development tasks are managed through poethepoet:
# One-time setup (installs Rust components)
uv run poe setup
# Pre-commit checks (format, lint, test everything)
uv run poe all
# Individual commands
uv run poe format # Format all code (Rust + Python)
uv run poe test # Run all tests (Rust + Python)
uv run poe typecheck # Run Python type checking
uv run poe rust-build # Build Rust binary
uv run poe build-ext # Build Python extension
# Check if Rust toolchain is properly installed
uv run poe check-rust
Manual Setup (if needed)
# Install all dependencies including dev dependencies
uv sync --all-extras
# Install Rust components manually
rustup component add rustfmt clippy
# Build Rust extension for Python
uv run maturin develop
# Run tests manually
cargo test # Rust tests
uv run pytest # Python tests
Building
# Build Rust binary (creates isum executable)
cargo build --release
# Build Python wheels
maturin build --release
Funding
This project has received funding from the European Commission's Horizon Europe Research and Innovation programme under grant agreement No. 101129751 as part of the BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).
License
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iscc_sum-0.1.0.tar.gz.
File metadata
- Download URL: iscc_sum-0.1.0.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc0071b384e8fb3079fed79bca07c3b17adc8f908f95bc62ed83f6570ceaa6a2
|
|
| MD5 |
4a9b18ed2bdcb5657b403fb1289826da
|
|
| BLAKE2b-256 |
1e22dd1bc26bc6742bb88f57af4d9e9d5ebca9fba623d372a28ccf076c58f9a8
|
File details
Details for the file iscc_sum-0.1.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: iscc_sum-0.1.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 200.0 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64b943e751674137dbea23f7e16874e5776081d5720900be44a143939c856677
|
|
| MD5 |
621508c0568ecb8374abb66950cdf0dd
|
|
| BLAKE2b-256 |
1da84b645fda17516a63b8196ef0e6a803dbabeb7ea7771e4e6a67fd5b435c2e
|
File details
Details for the file iscc_sum-0.1.0-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: iscc_sum-0.1.0-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 306.3 kB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
913357b1ba4122bbd4a15218cc469ed6da7e73562b13cd19c49e9cf8da84bc0c
|
|
| MD5 |
66e0b2b369de6a641143de56d04e5472
|
|
| BLAKE2b-256 |
cd87ee5488c5860b4f29e7ba3c6a0f2b2895f2f9fbf91b88e40745dfb72df18e
|
File details
Details for the file iscc_sum-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: iscc_sum-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 255.7 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20077befaa63286f147d97583ff9a6c6649ed981a34b8fb5a223e7b4333a9cce
|
|
| MD5 |
dad1ce6c792d24e09dfd5adcd82892e1
|
|
| BLAKE2b-256 |
86c13908d6a6ecb0c2bd99a014c98801bd83b1568c1d1ac9c5c610146d4de6c9
|
File details
Details for the file iscc_sum-0.1.0-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: iscc_sum-0.1.0-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 283.1 kB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ddc8a8b865e54a3dc79bf0a3e52dac4636437f5d2f70db649222c19e3cef4a1
|
|
| MD5 |
c7b9021956bab1f4493dd1ad58830540
|
|
| BLAKE2b-256 |
655ea77a3989103710c72535530136b61ef6502b9e2bd116ac1d44d0be1abcdd
|