Skip to main content

ISCC - Core Algorithms

Project description

ISCC - Codec & Algorithms

Build Version Coverage Quality Downloads

Create similarity-preserving identifiers for digital content

iscc-core is the reference implementation of the core algorithms of ISO 24138ISCC (International Standard Content Code)

Key Features

  • Similarity-Preserving: Detect similar content even after modifications
  • Multi-Level Identification: Identify content at metadata, perceptual, and data levels
  • Self-Describing: Each component contains its own type and version information
  • ISO Standardized: Implements the official ISO 24138:2024 specification
  • Highly Tested: 100% test coverage with conformance test vectors

What is the ISCC

The ISCC is a similarity preserving fingerprint and identifier for digital media assets.

ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash).

The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each component is self-describing, modular, and can be used separately or with others to aid in various content identification tasks. The algorithmic design supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and general digital asset management use-cases.

What is iscc-core

iscc-core is the python based reference implementation of the ISCC core algorithms as defined by ISO 24138. It is also a good reference for porting ISCC to other programming languages.

!!! tip This is a low level reference implementation that does not inlcude features like mediatype detection, metadata extraction or file format specific content extraction. Please have a look at iscc-sdk which adds those higher level features on top of the iscc-core library.

Implementors Guide

Reproducible Environment

For reproducible installation of the reference implementation we included a poetry.lock file with pinned dependencies. Install them using Python Poetry with the command poetry install in the root folder.

Repository structure

iscc-core
├── docs       # Markdown and other assets for mkdocs documentation
├── examples   # Example scripts using the reference code
├── iscc_core  # Actual source code of the reference implementation
├── tests      # Tests for the reference implementation
└── tools      # Development tools

Testing & Conformance

The reference implementation comes with 100% test coverage. To run the conformance selftest from the repository root use poetry run python -m iscc_core. To run the complete test suite use poetry run pytest.

To build a conformant implementation work through the follwing top level entrypoint functions:

gen_meta_code_v0
gen_text_code_v0
gen_image_code_v0
gen_audio_code_v0
gen_video_code_v0
gen_mixed_code_v0
gen_data_code_v0
gen_instance_code_v0
gen_iscc_code_v0

The corresponding test vectors can be found in iscc_core/data.json.

ISCC Architecture

ISCC Architecture

ISCC MainTypes

Idx Slug Bits Purpose
0 META 0000 Match on metadata similarity
1 SEMANTIC 0001 Match on semantic content similarity
2 CONTENT 0010 Match on perceptual content similarity
3 DATA 0011 Match on data similarity
4 INSTANCE 0100 Match on data identity
5 ISCC 0101 Composite of two or more components with common header

Installation

Use the package manager pip to install iscc-core as a library.

pip install iscc-core

Quick Start

import json
import iscc_core as ic

meta_code = ic.gen_meta_code(name="ISCC Test Document!")

print(f"Meta-Code:     {meta_code['iscc']}")
print(f"Structure:     {ic.iscc_explain(meta_code['iscc'])}\n")

# Extract text from file
with open("demo.txt", "rt", encoding="utf-8") as stream:
    text = stream.read()
    text_code = ic.gen_text_code_v0(text)
    print(f"Text-Code:     {text_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(text_code['iscc'])}\n")

# Process raw bytes of textfile
with open("demo.txt", "rb") as stream:
    data_code = ic.gen_data_code(stream)
    print(f"Data-Code:     {data_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(data_code['iscc'])}\n")

    stream.seek(0)
    instance_code = ic.gen_instance_code(stream)
    print(f"Instance-Code: {instance_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(instance_code['iscc'])}\n")

# Combine ISCC-UNITs into ISCC-CODE
iscc_code = ic.gen_iscc_code(
    (meta_code["iscc"], text_code["iscc"], data_code["iscc"], instance_code["iscc"])
)

# Create convenience `Code` object from ISCC string
iscc_obj = ic.Code(iscc_code["iscc"])
print(f"ISCC-CODE:     {ic.iscc_normalize(iscc_obj.code)}")
print(f"Structure:     {iscc_obj.explain}")
print(f"Multiformat:   {iscc_obj.mf_base32}\n")

# Compare with changed ISCC-CODE:
new_dc, new_ic = ic.Code.rnd(mt=ic.MT.DATA), ic.Code.rnd(mt=ic.MT.INSTANCE)
new_iscc = ic.gen_iscc_code((meta_code["iscc"], text_code["iscc"], new_dc.uri, new_ic.uri))
print(f"Compare ISCC-CODES:\n{iscc_obj.uri}\n{new_iscc['iscc']}")
print(json.dumps(ic.iscc_compare(iscc_obj.code, new_iscc["iscc"]), indent=2))

The output of this example is as follows:

Meta-Code:     ISCC:AAAT4EBWK27737D2
Structure:     META-NONE-V0-64-3e103656bffdfc7a

Text-Code:     ISCC:EAAQMBEYQF6457DP
Structure:     CONTENT-TEXT-V0-64-060498817dcefc6f

Data-Code:     ISCC:GAA7UJMLDXHPPENG
Structure:     DATA-NONE-V0-64-fa258b1dcef791a6

Instance-Code: ISCC:IAA3Y7HR2FEZCU4N
Structure:     INSTANCE-NONE-V0-64-bc7cf1d14991538d

ISCC-CODE:     ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
Structure:     ISCC-TEXT-V0-MCDI-3e103656bffdfc7a060498817dcefc6ffa258b1dcef791a6bc7cf1d14991538d
Multiformat:   bzqavabj6ca3fnp757r5ambeyqf6457dp7isywhoo66i2npd46hiutektru

Compare ISCC-CODES:
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G7Y7HA2BMECKMVRBEQXR2BJOS6NA
{
  "meta_dist": 0,
  "content_dist": 0,
  "data_dist": 33,
  "instance_match": false
}

Documentation

Documentation is published at https://core.iscc.codes

Development

Requirements

  • Python 3.9 or higher for code generation and static site building.
  • Poetry for installation and dependency management.

Development Setup

git clone https://github.com/iscc/iscc-core.git
cd iscc-core
poetry install

Development Tasks

Tests, coverage, code formatting and other tasks can be run with the poe command:

poe

Poe the Poet - A task runner that works well with poetry.
version 0.18.1

Result: No task specified.

USAGE
  poe [-h] [-v | -q] [--root PATH] [--ansi | --no-ansi] task [task arguments]

GLOBAL OPTIONS
  -h, --help     Show this help page and exit
  --version      Print the version and exit
  -v, --verbose  Increase command output (repeatable)
  -q, --quiet    Decrease command output (repeatable)
  -d, --dry-run  Print the task contents but don't actually run it
  --root PATH    Specify where to find the pyproject.toml
  --ansi         Force enable ANSI output
  --no-ansi      Force disable ANSI output
CONFIGURED TASKS
  gentests       Generate conformance test data
  format         Code style formating with black
  docs           Copy README.md to /docs
  format-md      Markdown formating with mdformat
  lf             Convert line endings to lf
  test           Run tests with coverage
  sec            Security check with bandit
  all

Use poe all to run all tasks before committing any changes.

Maintainers

@titusz

Contributing

Pull requests are welcome. For significant changes, please open an issue first to discuss your plans. Please make sure to update tests as appropriate.

You may also want join our developer chat on Telegram at https://t.me/iscc_dev.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_core-1.2.2.tar.gz (64.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

iscc_core-1.2.2-cp313-cp313-win_amd64.whl (679.3 kB view details)

Uploaded CPython 3.13Windows x86-64

iscc_core-1.2.2-cp313-cp313-manylinux_2_39_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

iscc_core-1.2.2-cp313-cp313-macosx_15_0_arm64.whl (879.0 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

iscc_core-1.2.2-cp312-cp312-win_amd64.whl (681.0 kB view details)

Uploaded CPython 3.12Windows x86-64

iscc_core-1.2.2-cp312-cp312-manylinux_2_39_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

iscc_core-1.2.2-cp312-cp312-macosx_15_0_arm64.whl (884.4 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

iscc_core-1.2.2-cp311-cp311-win_amd64.whl (678.3 kB view details)

Uploaded CPython 3.11Windows x86-64

iscc_core-1.2.2-cp311-cp311-manylinux_2_39_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

iscc_core-1.2.2-cp311-cp311-macosx_15_0_arm64.whl (879.2 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

iscc_core-1.2.2-cp310-cp310-win_amd64.whl (677.9 kB view details)

Uploaded CPython 3.10Windows x86-64

iscc_core-1.2.2-cp310-cp310-manylinux_2_39_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

iscc_core-1.2.2-cp310-cp310-macosx_15_0_arm64.whl (878.2 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

iscc_core-1.2.2-cp39-cp39-win_amd64.whl (678.1 kB view details)

Uploaded CPython 3.9Windows x86-64

iscc_core-1.2.2-cp39-cp39-manylinux_2_39_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.39+ x86-64

iscc_core-1.2.2-cp39-cp39-macosx_15_0_arm64.whl (879.2 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

File details

Details for the file iscc_core-1.2.2.tar.gz.

File metadata

  • Download URL: iscc_core-1.2.2.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2.tar.gz
Algorithm Hash digest
SHA256 1d2b125207bb92a1f94395418bb10349e28191772ed4fa6c4635465b2910bb08
MD5 227008b382508cca6f22e129e9341b6d
BLAKE2b-256 a80affce4767949f0a56653c9e8fcf784a5cf058e57b65bdaaef351185fbdfff

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 679.3 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2bc10799562e59c5c0e621bdda44ef7afa13b721e5dd6295a992c70da0a56d03
MD5 9e9ff43ce36526f4c596aaebdf8b0a17
BLAKE2b-256 dd95260bdf04bd521b0a9e09f7ccfef943a0d57a9f449dd445cc88c61b9f5504

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 e0a620ddca69554ee68930c6167a225e2609447ab8907decd0977f780f8b40af
MD5 5cc547f817551389954bf9e9fb25cf85
BLAKE2b-256 4d6fa5cbd61c292fa94468dfd0fb3c8ab9a732cfbb97bc899a36713f8e112094

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 998b6a33a1bdfb849effc2ba735de130903eda0cb82418eec3a984e775d989d0
MD5 6aa5625694c58b6803007c24e0d4bbb8
BLAKE2b-256 6536d32e1247fb65fa60511d9a123dbe748a6a39b06a315ff5244d0b63be8fd6

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 681.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5638c0669a094479d9a8573599e31924fd792067828ad56a11a61006bd4e26be
MD5 d3d217dc1f51113dd74051250233e236
BLAKE2b-256 ab0a96d0c427e2334c2aac384e1ea8f12bdfc2659eee5428b918e6585dbe6bf2

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 f50d98477300ce9063df1b8c770a86b010c59d0fe5e2c5886611214c5608853a
MD5 3bfe1f09687914905c86171137d44138
BLAKE2b-256 72216ee82954b28cc796629bdc886d19431e55a23f57a097b4619e0bfe2085b8

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 2c393d0b05981eb2874ee3a8244650dcb697e82b5e9f1164357dc14d166538cc
MD5 5185b186676deb128f4a9192b7d34bd1
BLAKE2b-256 360996c35cd423286a657c890d8e02ef8aa8546f4e086764a93772ea92be8e53

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 678.3 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e0abbf16b08d1f42654f7f19eb01353a2056e18fa377244fe05f3dd455173820
MD5 f9b6cb6d5342373f8bb62a424d5d1622
BLAKE2b-256 e10a9a90aa112627df51e30b1da4dc8c17d772e0314b42ed40155fe806232a27

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 2fa55773cff6ca45dc3aac447f61be537f627bd1c94ec5160619335a905dfe41
MD5 f5fa5aeb40baf6abd47f8932db8ef8ba
BLAKE2b-256 1ea37951cb8f7f1e0a132c471623a67786348d39a55ceb515b3ed8880ce4f08e

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 dbff2510fbad19e3576f107bd0ba03003b1defc509220b617debecb45e522355
MD5 0498afd853eb71e81c2bdc33fbebbdd0
BLAKE2b-256 815ce1f5a4b493819f3c07115e2414949471fac313f1424b5ddb3fab4d1fea1c

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 677.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c264e35919634dc89e1af6a8bc0cb6282836403c83e369258af18fc3c0ee780b
MD5 da8dea71fc373c3db8d897236d60a74c
BLAKE2b-256 b82c46e164e2914ae440086f589cc6bca26209162da130a45c18a3795bf8c2dc

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 bfe8974f57c135e3fd09d886cf65dd413e29ab00b55c6448cfa3126f3033fb78
MD5 82eae5fa90faebc8df662597e2129e2d
BLAKE2b-256 01cf265b06e0edea2e822bfea325db7c9c8541b3f356e52868124601c8626135

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c3c0a28994cf3907d8992a06862c05bcba718aa50db3fa1fc849a0a36ef27e30
MD5 bc765fe09deb6696f6a5291bdfb388f9
BLAKE2b-256 069e3595d496ffb70fa75d59505603b6f07dc9197679cd9bb5d18b9c2c9305cc

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 678.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b17bb69898409f36d2f0ef5784b6bee0a716c8cfd0c3a08d63d8728c4a40c10d
MD5 fdddde383dfaf6d2826838e0902609e7
BLAKE2b-256 9c742b9d359f272f77f6b3cf6525d704a21c7d566d8d0330b72d4da362b31c88

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp39-cp39-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp39-cp39-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 075e32c0fd0888171a5bcaf46c839c8113498543cb90d4bfb5488f5892cc78d7
MD5 76eedfd483f61a4bb714f654270f7cd8
BLAKE2b-256 9757cde293a24dce979f6ee93c108c3efaee8410e39f880ec9adefc95bb16354

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.2-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.2-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 82362c503ffc6fd58bf9b6fc9195d318549151ce5333afcab34ad6d1413c712a
MD5 ab0aa9d2ee48a0c03f520c1847e97f53
BLAKE2b-256 198abba86b32df24b2980482e318ebb94941cf05e687e5023a04b8dce7e1c8d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page