Skip to main content

High-performance ISCC similarity search engine

Project description

iscc-search

Release Tests codecov Commit activity License

[!WARNING] BETA - This project is under active development. The API is not yet stable and may change without notice. Use at your own risk.

High-performance similarity search engine for ISCC (International Standard Content Code). Ships as a Python package, a CLI, and a FastAPI REST server, with pluggable backends for in-memory, LMDB, and HNSW-accelerated indexes.

Note: iscc-usearch is a separate project - a patched fork of the usearch vector search library that provides the NPHD metric and low-level vector indexes. iscc-search uses it internally as one of its backends. Most users only need to install iscc-search.

Features

  • REST API server (FastAPI) for indexing and searching ISCC assets
  • CLI (iscc-search) for managing multiple local or remote indexes and ingesting assets
  • Protocol-based backend abstraction with three implementations:
    • memory:// — in-memory, no persistence (tests and demos)
    • lmdb:///path — LMDB-backed persistent storage with bidirectional prefix search
    • usearch:///path — HNSW + LMDB for high-performance approximate nearest neighbor search
  • Variable-length ISCC-UNIT indexing using the NPHD metric (via iscc-usearch)
  • Granular ISCC-SIMPRINT search for fine-grained content matching
  • Aggregator mode for ISCC Declaration Protocol (IDP) transparency-log ingestion
  • Built-in web frontend for ISCC lookup/search and aggregator monitoring
  • Cross-platform (Linux, macOS, Windows)
  • Python 3.11–3.14

What is ISCC?

The International Standard Content Code (ISCC) is a similarity-preserving content identifier for digital media. ISCC codes are variable-length binary vectors that enable efficient similarity search across different media types. This project provides the indexing and search engine for those codes.

Installation

pip install iscc-search

For development:

git clone https://github.com/iscc/iscc-search.git
cd iscc-search
uv sync

Quick Start

Run the server

# Start the REST API server (development mode with auto-reload)
iscc-search serve --dev

# Or production mode
iscc-search serve --host 0.0.0.0 --port 8000

Interactive API docs are available at http://localhost:8000/docs.

Use the CLI

# Register an index configuration (local or remote)
iscc-search index add my-index --uri usearch:///path/to/data
iscc-search index use my-index

# Add assets, search, retrieve
iscc-search add asset.json
iscc-search search asset.json
iscc-search get ISCC:KACYPXW557...

Configure the server

The server reads its configuration from environment variables prefixed with ISCC_SEARCH_ (or a .env file):

Variable Default Description
ISCC_SEARCH_INDEX_URI usearch:///... Backend URI (memory://, lmdb:///path, usearch:///path)
ISCC_SEARCH_HOST 0.0.0.0 Server bind host
ISCC_SEARCH_PORT 8000 Server bind port
ISCC_SEARCH_API_SECRET (unset) Optional API key; when unset the API is public
ISCC_SEARCH_CORS_ORIGINS * Comma-separated CORS origins
ISCC_SEARCH_LOG_LEVEL info Loguru log level

Additional knobs control HNSW parameters, shard sizes, match thresholds, and scoring — see iscc_search/options.py or the deployment guide for the full list.

Architecture

iscc-search uses a protocol-based design so the CLI, REST API, and library users all talk to the same IsccIndexProtocol interface regardless of backend:

  CLI / REST API / Remote client
              │
              ▼
     IsccIndexProtocol
              │
    ┌─────────┼─────────┐
    ▼         ▼         ▼
  memory    lmdb      usearch
            (LMDB)    (HNSW + LMDB)

See docs/explanation/architecture.md for the full picture.

Development

This project uses uv for package management and poethepoet for task automation.

Prerequisites

  • Python 3.11 or higher
  • uv package manager

Common tasks

uv run poe build            # Rebuild schema.py + openapi.json and validate
uv run poe format           # Format code and markdown
uv run poe test             # Run tests with coverage (must stay at 100%)
uv run poe check-complexity # Radon complexity report
uv run poe precommit        # Run pre-commit hooks
uv run poe all              # Build, format, test, and complexity

Running tests

# Run full test suite in parallel with coverage
uv run poe test

# Run a single test
uv run pytest tests/test_indexes_usearch_index.py::test_foo

Technical Notes

NPHD Metric

The Normalized Prefix Hamming Distance (NPHD) is a valid metric specifically designed for variable-length prefix-compatible codes like ISCC. Unlike standard Hamming distance, NPHD:

  • Correctly handles variable-length comparisons
  • Normalizes over the common prefix length
  • Satisfies all metric axioms (non-negativity, identity, symmetry, triangle inequality)

The implementation lives in the external iscc-usearch package, which iscc-search depends on for its HNSW backend.

Storage

  • LMDB is used for durable key-value storage: ISCC entries, metadata, and the inverted prefix-search index.
  • usearch (HNSW) is used for approximate nearest-neighbor search over ISCC-UNITs and ISCC-SIMPRINTS.
  • Multi-worker deployments are not supported with the usearch backend — see docs/howto/deployment.md for details.

License

Apache License 2.0 - see LICENSE file for details.

Contributing

Contributions are welcome! Please ensure:

  • All tests pass (uv run poe test)
  • Code is formatted (uv run poe format)
  • Coverage remains at 100%
  • Changes are documented

See CONTRIBUTING.md for details.


Repository initiated with fpgmaas/cookiecutter-uv.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_search-0.3.0.tar.gz (968.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iscc_search-0.3.0-py3-none-any.whl (412.4 kB view details)

Uploaded Python 3

File details

Details for the file iscc_search-0.3.0.tar.gz.

File metadata

  • Download URL: iscc_search-0.3.0.tar.gz
  • Upload date:
  • Size: 968.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for iscc_search-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0c93c035eb867ca1e2b75a2da9859609a5a16afc3ef1a5c464e677be396825cc
MD5 ee3261d7aa3fbbfbc27006dac38c8c9e
BLAKE2b-256 179bab826d8208345174252724938c030bea9a28e6c70909fba04f0ce0f511a2

See more details on using hashes here.

File details

Details for the file iscc_search-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for iscc_search-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b78751cca3e30cbf03568fcd3e9429891e58b12c10876977cca8a700a1b89fb3
MD5 2d61e405543c322473b15232d86b5a48
BLAKE2b-256 b402bb88fcec167a3974a618953cfa86fb2c412d569d2c21a41d5e7d9351059d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page