Skip to main content

ISCC - Core Algorithms

Project description

ISCC - Codec & Algorithms

Build Version Coverage Quality Downloads

Create similarity-preserving identifiers for digital content

iscc-core is the reference implementation of the core algorithms of ISO 24138ISCC (International Standard Content Code)

Key Features

  • Similarity-Preserving: Detect similar content even after modifications
  • Multi-Level Identification: Identify content at metadata, perceptual, and data levels
  • Self-Describing: Each component contains its own type and version information
  • ISO Standardized: Implements the official ISO 24138:2024 specification
  • Highly Tested: 100% test coverage with conformance test vectors

What is the ISCC

The ISCC is a similarity preserving fingerprint and identifier for digital media assets.

ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash).

The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each component is self-describing, modular, and can be used separately or with others to aid in various content identification tasks. The algorithmic design supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and general digital asset management use-cases.

What is iscc-core

iscc-core is the python based reference implementation of the ISCC core algorithms as defined by ISO 24138. It is also a good reference for porting ISCC to other programming languages.

!!! tip This is a low level reference implementation that does not inlcude features like mediatype detection, metadata extraction or file format specific content extraction. Please have a look at iscc-sdk which adds those higher level features on top of the iscc-core library.

Implementors Guide

Reproducible Environment

For reproducible installation of the reference implementation we included a poetry.lock file with pinned dependencies. Install them using Python Poetry with the command poetry install in the root folder.

Repository structure

iscc-core
├── docs       # Markdown and other assets for mkdocs documentation
├── examples   # Example scripts using the reference code
├── iscc_core  # Actual source code of the reference implementation
├── tests      # Tests for the reference implementation
└── tools      # Development tools

Testing & Conformance

The reference implementation comes with 100% test coverage. To run the conformance selftest from the repository root use poetry run python -m iscc_core. To run the complete test suite use poetry run pytest.

To build a conformant implementation work through the follwing top level entrypoint functions:

gen_meta_code_v0
gen_text_code_v0
gen_image_code_v0
gen_audio_code_v0
gen_video_code_v0
gen_mixed_code_v0
gen_data_code_v0
gen_instance_code_v0
gen_iscc_code_v0

The corresponding test vectors can be found in iscc_core/data.json.

ISCC Architecture

ISCC Architecture

ISCC MainTypes

Idx Slug Bits Purpose
0 META 0000 Match on metadata similarity
1 SEMANTIC 0001 Match on semantic content similarity
2 CONTENT 0010 Match on perceptual content similarity
3 DATA 0011 Match on data similarity
4 INSTANCE 0100 Match on data identity
5 ISCC 0101 Composite of two or more components with common header

Installation

Use the package manager pip to install iscc-core as a library.

pip install iscc-core

Quick Start

import json
import iscc_core as ic

meta_code = ic.gen_meta_code(name="ISCC Test Document!")

print(f"Meta-Code:     {meta_code['iscc']}")
print(f"Structure:     {ic.iscc_explain(meta_code['iscc'])}\n")

# Extract text from file
with open("demo.txt", "rt", encoding="utf-8") as stream:
    text = stream.read()
    text_code = ic.gen_text_code_v0(text)
    print(f"Text-Code:     {text_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(text_code['iscc'])}\n")

# Process raw bytes of textfile
with open("demo.txt", "rb") as stream:
    data_code = ic.gen_data_code(stream)
    print(f"Data-Code:     {data_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(data_code['iscc'])}\n")

    stream.seek(0)
    instance_code = ic.gen_instance_code(stream)
    print(f"Instance-Code: {instance_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(instance_code['iscc'])}\n")

# Combine ISCC-UNITs into ISCC-CODE
iscc_code = ic.gen_iscc_code(
    (meta_code["iscc"], text_code["iscc"], data_code["iscc"], instance_code["iscc"])
)

# Create convenience `Code` object from ISCC string
iscc_obj = ic.Code(iscc_code["iscc"])
print(f"ISCC-CODE:     {ic.iscc_normalize(iscc_obj.code)}")
print(f"Structure:     {iscc_obj.explain}")
print(f"Multiformat:   {iscc_obj.mf_base32}\n")

# Compare with changed ISCC-CODE:
new_dc, new_ic = ic.Code.rnd(mt=ic.MT.DATA), ic.Code.rnd(mt=ic.MT.INSTANCE)
new_iscc = ic.gen_iscc_code((meta_code["iscc"], text_code["iscc"], new_dc.uri, new_ic.uri))
print(f"Compare ISCC-CODES:\n{iscc_obj.uri}\n{new_iscc['iscc']}")
print(json.dumps(ic.iscc_compare(iscc_obj.code, new_iscc["iscc"]), indent=2))

The output of this example is as follows:

Meta-Code:     ISCC:AAAT4EBWK27737D2
Structure:     META-NONE-V0-64-3e103656bffdfc7a

Text-Code:     ISCC:EAAQMBEYQF6457DP
Structure:     CONTENT-TEXT-V0-64-060498817dcefc6f

Data-Code:     ISCC:GAA7UJMLDXHPPENG
Structure:     DATA-NONE-V0-64-fa258b1dcef791a6

Instance-Code: ISCC:IAA3Y7HR2FEZCU4N
Structure:     INSTANCE-NONE-V0-64-bc7cf1d14991538d

ISCC-CODE:     ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
Structure:     ISCC-TEXT-V0-MCDI-3e103656bffdfc7a060498817dcefc6ffa258b1dcef791a6bc7cf1d14991538d
Multiformat:   bzqavabj6ca3fnp757r5ambeyqf6457dp7isywhoo66i2npd46hiutektru

Compare ISCC-CODES:
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G7Y7HA2BMECKMVRBEQXR2BJOS6NA
{
  "meta_dist": 0,
  "content_dist": 0,
  "data_dist": 33,
  "instance_match": false
}

Documentation

Documentation is published at https://core.iscc.codes

Development

Requirements

  • Python 3.9 or higher for code generation and static site building.
  • Poetry for installation and dependency management.

Development Setup

git clone https://github.com/iscc/iscc-core.git
cd iscc-core
poetry install

Development Tasks

Tests, coverage, code formatting and other tasks can be run with the poe command:

poe

Poe the Poet - A task runner that works well with poetry.
version 0.18.1

Result: No task specified.

USAGE
  poe [-h] [-v | -q] [--root PATH] [--ansi | --no-ansi] task [task arguments]

GLOBAL OPTIONS
  -h, --help     Show this help page and exit
  --version      Print the version and exit
  -v, --verbose  Increase command output (repeatable)
  -q, --quiet    Decrease command output (repeatable)
  -d, --dry-run  Print the task contents but don't actually run it
  --root PATH    Specify where to find the pyproject.toml
  --ansi         Force enable ANSI output
  --no-ansi      Force disable ANSI output
CONFIGURED TASKS
  gentests       Generate conformance test data
  format         Code style formating with black
  docs           Copy README.md to /docs
  format-md      Markdown formating with mdformat
  lf             Convert line endings to lf
  test           Run tests with coverage
  sec            Security check with bandit
  all

Use poe all to run all tasks before committing any changes.

Maintainers

@titusz

Contributing

Pull requests are welcome. For significant changes, please open an issue first to discuss your plans. Please make sure to update tests as appropriate.

You may also want join our developer chat on Telegram at https://t.me/iscc_dev.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_core-1.2.1.tar.gz (64.3 kB view details)

Uploaded Source

Built Distributions

iscc_core-1.2.1-cp313-cp313-win_amd64.whl (642.5 kB view details)

Uploaded CPython 3.13Windows x86-64

iscc_core-1.2.1-cp313-cp313-manylinux_2_39_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

iscc_core-1.2.1-cp313-cp313-macosx_14_0_arm64.whl (828.6 kB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

iscc_core-1.2.1-cp312-cp312-win_amd64.whl (644.7 kB view details)

Uploaded CPython 3.12Windows x86-64

iscc_core-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

iscc_core-1.2.1-cp312-cp312-macosx_14_0_arm64.whl (836.1 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

iscc_core-1.2.1-cp311-cp311-win_amd64.whl (644.5 kB view details)

Uploaded CPython 3.11Windows x86-64

iscc_core-1.2.1-cp311-cp311-manylinux_2_39_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

iscc_core-1.2.1-cp311-cp311-macosx_14_0_arm64.whl (833.2 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

iscc_core-1.2.1-cp310-cp310-win_amd64.whl (644.5 kB view details)

Uploaded CPython 3.10Windows x86-64

iscc_core-1.2.1-cp310-cp310-manylinux_2_39_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

iscc_core-1.2.1-cp310-cp310-macosx_14_0_arm64.whl (833.9 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

iscc_core-1.2.1-cp39-cp39-win_amd64.whl (644.5 kB view details)

Uploaded CPython 3.9Windows x86-64

iscc_core-1.2.1-cp39-cp39-manylinux_2_39_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.39+ x86-64

iscc_core-1.2.1-cp39-cp39-macosx_14_0_arm64.whl (834.2 kB view details)

Uploaded CPython 3.9macOS 14.0+ ARM64

File details

Details for the file iscc_core-1.2.1.tar.gz.

File metadata

  • Download URL: iscc_core-1.2.1.tar.gz
  • Upload date:
  • Size: 64.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1.tar.gz
Algorithm Hash digest
SHA256 5c13e79b149843d3bfd3bf2f702966f88ad92469a8bb76109bcb790c7630b84c
MD5 427c1ff604f3f5c1ab6c7b2416f799b4
BLAKE2b-256 67a387794f946b8a0f55a7d3ba88e068a4fd44b1e4f02c010abd0fb632ab858b

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 642.5 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a56558d39fc8f750ada294bb8564bbbc278b0fa5ecbbdc08a964d455e866131b
MD5 bf103e3e311988460eef47db17b87b54
BLAKE2b-256 3f504cb63307c7f6eb0318af721daaca6035c83ebe8ad1dae9376ce943389eb0

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 e0bd21b6e2e6ad736d86aec11d619028427ab44acfc45db7b97dc15a87b97846
MD5 d7c9198b19e591a9a914ef65a3dfdd80
BLAKE2b-256 2c098b52f553b7e2acc6bbb89ba360b1a855aaed5f975cbdc8b28720adef7754

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 573a0d27e40165d7febb1408864a3a66c7305a039f98476a50fad4d60c1c3ba8
MD5 f6811fa43320dadc712c1fadf9f13a41
BLAKE2b-256 43d372ff4a45980399e8f79857816c39f4106d2f58c79c73142ba99dbe5a83dd

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 644.7 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 13330ad8af909571dd6cf3536eaccc58f4c158f9fbb2b34c253aa27a3ca7ca25
MD5 897ce7335724202b7c6b67213e099046
BLAKE2b-256 5e76eff288beaeed8ef1f5868c9377be9be187ae70ea56aee4c070e8fc77af74

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 768d101ef5f114132b6a8f144266d564ec2976084934a958dda6e3b509c5f085
MD5 69ad06f97b3aaa7c8332161effc8c9e9
BLAKE2b-256 abba9e4a85cf386cc6ab4a73e16fad273eed40521a5ac05cefb86613d7a38144

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b7b9ed53f7a5a1830f62da7e441207c346f505ed6e9c340afd81ea328a836036
MD5 2b4ae399eee09e63e4d5477f77afee73
BLAKE2b-256 711b900544974b4b0cb661a4c672d9a2d7ca6d43fa1f56f32d3dfa1e0a4a1908

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 644.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7f695ca0831256576afd95afd58c210399183a4299168741ff583fb1f08fdb08
MD5 16fd50889985aedbfc164bcce63c0fe4
BLAKE2b-256 340d0509258f1db481b1bcc5fa89fa4f65be243a07ab7b37180fa87577a94763

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 8c9a4d2963bcb4443104db4ee814a93d97353f37d4098bab9eccf331390d7743
MD5 b7caf385ba59d75e225c3751e9b21103
BLAKE2b-256 85315e6ca33ae551427fb76d7cc83909631169847821eb2ed559d54759a8a78a

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 66b9b582e50ec96d0033d997d3305f327f12f970451db5cc4a7cc29a88bb4ba7
MD5 c178756915c5f38d9668677e5ea22bcd
BLAKE2b-256 eba32e9f0e2013a6309699f7f2f58a4e2449022511736e1f22ed62fd93e2ce69

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 644.5 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c5a5005efea8fe30959b96537184135398000c6913d71d40c92810a1dfd2f133
MD5 881b5b007a076d2cf211b84a48b49454
BLAKE2b-256 a6600a058215011beb80969e75e0ceda0c9a78182a2f2e2cace42e92d613c01d

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 c279d7d0737ac7d815ded2d0b0167055db213cc0e95c6b2b92ab60c1a10e27b7
MD5 a170e1449fd2758a8df8673ae806942d
BLAKE2b-256 9e59e0ac30671331c0d76e702ac034eddb0edb3ba9107eee2ff0b3c85e75c5ae

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 20237069f306715e3dd1493ae056f45573d09b70d3b8ce743e06f755eb1efae7
MD5 3c75d5330bd09ac1eb046587976de9ea
BLAKE2b-256 368b0498cd336e186b11705140d44808681a651632fa2683d71eab35048e3d88

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.2.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 644.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.0 Windows/10

File hashes

Hashes for iscc_core-1.2.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 00191ebe30896659e259ed0b060417f1d40cc2758b632525630cd06a01084f5b
MD5 958851e84f94e9fe86d4423f408c58d6
BLAKE2b-256 a3bf53061e0b49cbcace4fcf7d459360840b435776e94e177a3d31cb17ed2815

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp39-cp39-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp39-cp39-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 dda546a32dec227f194913492720bb84bfcf20805ed052b7e123d7012f411137
MD5 7504defbcdfb6237a1216f0e6b30b1a0
BLAKE2b-256 71eaa810741630a643d38782129700ba44a03de32a1e95616e05d251f751a2a5

See more details on using hashes here.

File details

Details for the file iscc_core-1.2.1-cp39-cp39-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for iscc_core-1.2.1-cp39-cp39-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b8eadc2dd50ad8d75747d5c29235f210a28f434c0d4aab9015f3b3f99b90c483
MD5 a9a8c2243faa576c48403c8267c0ec24
BLAKE2b-256 f40b3ffc62f43e783b7ee95310755e6ea5793edea49e67bf0e7de9a5698c816d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page