Skip to main content

ISCC - Core Algorithms

Project description

ISCC - Codec & Algorithms

Build Version Coverage Quality Downloads

iscc-core is the reference implementation of the core algorithms of the ISCC (International Standard Content Code)

What is the ISCC

The ISCC is a similarity preserving fingerprint and identifier for digital media assets.

ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash).

The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each component is self-describing, modular, and can be used separately or with others to aid in various content identification tasks. The algorithmic design supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and general digital asset management use-cases.

What is iscc-core

iscc-core is the python based reference implementation of the ISCC core algorithms as defined by ISO 24138. It also a good reference for porting ISCC to other programming languages.

!!! tip This is a low level reference implementation that does not inlcude features like mediatype detection, metadata extraction or file format specific content extraction. Please have a look at iscc-sdk which adds those higher level features on top of the iscc-core library.

Implementors Guide

Reproducible Environment

For reproducible installation of the reference implementation we included a poetry.lock file with pinned dependencies. Install them using Python Poetry with the command poetry install in the root folder.

Repository structure

iscc-core
├── docs       # Markdown and other assets for mkdocs documentation
├── examples   # Example scripts using the reference code
├── iscc_core  # Actual source code of the reference implementation
├── tests      # Tests for the reference implementation
└── tools      # Development tools

Testing & Conformance

The reference implementation comes with 100% test coverage. To run the conformance selftest from the repository root use poetry run python -m iscc_core. To run the complete test suite use poetry run pytest.

To build a conformant implementation work through the follwing top level entrypoint functions:

gen_meta_code_v0
gen_text_code_v0
gen_image_code_v0
gen_audio_code_v0
gen_video_code_v0
gen_mixed_code_v0
gen_data_code_v0
gen_instance_code_v0
gen_iscc_code_v0

The corresponding test vectors can be found in iscc_core/data.json.

ISCC Architecture

ISCC Architecture

ISCC MainTypes

Idx Slug Bits Purpose
0 META 0000 Match on metadata similarity
1 SEMANTIC 0001 Match on semantic content similarity
2 CONTENT 0010 Match on perceptual content similarity
3 DATA 0011 Match on data similarity
4 INSTANCE 0100 Match on data identity
5 ISCC 0101 Composite of two or more components with common header

Installation

Use the package manager pip to install iscc-core as a library.

pip install iscc-core

Quick Start

import json
import iscc_core as ic

meta_code = ic.gen_meta_code(name="ISCC Test Document!")

print(f"Meta-Code:     {meta_code['iscc']}")
print(f"Structure:     {ic.iscc_explain(meta_code['iscc'])}\n")

# Extract text from file
with open("demo.txt", "rt", encoding="utf-8") as stream:
    text = stream.read()
    text_code = ic.gen_text_code_v0(text)
    print(f"Text-Code:     {text_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(text_code['iscc'])}\n")

# Process raw bytes of textfile
with open("demo.txt", "rb") as stream:
    data_code = ic.gen_data_code(stream)
    print(f"Data-Code:     {data_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(data_code['iscc'])}\n")

    stream.seek(0)
    instance_code = ic.gen_instance_code(stream)
    print(f"Instance-Code: {instance_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(instance_code['iscc'])}\n")

# Combine ISCC-UNITs into ISCC-CODE
iscc_code = ic.gen_iscc_code(
    (meta_code["iscc"], text_code["iscc"], data_code["iscc"], instance_code["iscc"])
)

# Create convenience `Code` object from ISCC string
iscc_obj = ic.Code(iscc_code["iscc"])
print(f"ISCC-CODE:     {ic.iscc_normalize(iscc_obj.code)}")
print(f"Structure:     {iscc_obj.explain}")
print(f"Multiformat:   {iscc_obj.mf_base32}\n")

# Compare with changed ISCC-CODE:
new_dc, new_ic = ic.Code.rnd(mt=ic.MT.DATA), ic.Code.rnd(mt=ic.MT.INSTANCE)
new_iscc = ic.gen_iscc_code((meta_code["iscc"], text_code["iscc"], new_dc.uri, new_ic.uri))
print(f"Compare ISCC-CODES:\n{iscc_obj.uri}\n{new_iscc['iscc']}")
print(json.dumps(ic.iscc_compare(iscc_obj.code, new_iscc["iscc"]), indent=2))

The output of this example is as follows:

Meta-Code:     ISCC:AAAT4EBWK27737D2
Structure:     META-NONE-V0-64-3e103656bffdfc7a

Text-Code:     ISCC:EAAQMBEYQF6457DP
Structure:     CONTENT-TEXT-V0-64-060498817dcefc6f

Data-Code:     ISCC:GAA7UJMLDXHPPENG
Structure:     DATA-NONE-V0-64-fa258b1dcef791a6

Instance-Code: ISCC:IAA3Y7HR2FEZCU4N
Structure:     INSTANCE-NONE-V0-64-bc7cf1d14991538d

ISCC-CODE:     ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
Structure:     ISCC-TEXT-V0-MCDI-3e103656bffdfc7a060498817dcefc6ffa258b1dcef791a6bc7cf1d14991538d
Multiformat:   bzqavabj6ca3fnp757r5ambeyqf6457dp7isywhoo66i2npd46hiutektru

Compare ISCC-CODES:
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G7Y7HA2BMECKMVRBEQXR2BJOS6NA
{
  "meta_dist": 0,
  "content_dist": 0,
  "data_dist": 33,
  "instance_match": false
}

Documentation

Documentation is published at https://core.iscc.codes

Development

Requirements

  • Python 3.7.2 or higher for code generation and static site building.
  • Poetry for installation and dependency management.

Development Setup

git clone https://github.com/iscc/iscc-core.git
cd iscc-core
poetry install

Development Tasks

Tests, coverage, code formatting and other tasks can be run with the poe command:

poe

Poe the Poet - A task runner that works well with poetry.
version 0.18.1

Result: No task specified.

USAGE
  poe [-h] [-v | -q] [--root PATH] [--ansi | --no-ansi] task [task arguments]

GLOBAL OPTIONS
  -h, --help     Show this help page and exit
  --version      Print the version and exit
  -v, --verbose  Increase command output (repeatable)
  -q, --quiet    Decrease command output (repeatable)
  -d, --dry-run  Print the task contents but don't actually run it
  --root PATH    Specify where to find the pyproject.toml
  --ansi         Force enable ANSI output
  --no-ansi      Force disable ANSI output
CONFIGURED TASKS
  gentests       Generate conformance test data
  format         Code style formating with black
  docs           Copy README.md to /docs
  format-md      Markdown formating with mdformat
  lf             Convert line endings to lf
  test           Run tests with coverage
  sec            Security check with bandit
  all

Use poe all to run all tasks before committing any changes.

Maintainers

@titusz

Contributing

Pull requests are welcome. For significant changes, please open an issue first to discuss your plans. Please make sure to update tests as appropriate.

You may also want join our developer chat on Telegram at https://t.me/iscc_dev.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_core-1.0.8.tar.gz (58.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

iscc_core-1.0.8-cp312-cp312-win_amd64.whl (635.0 kB view details)

Uploaded CPython 3.12Windows x86-64

iscc_core-1.0.8-cp312-cp312-manylinux_2_31_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp312-cp312-macosx_11_0_x86_64.whl (822.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ x86-64

iscc_core-1.0.8-cp311-cp311-win_amd64.whl (634.4 kB view details)

Uploaded CPython 3.11Windows x86-64

iscc_core-1.0.8-cp311-cp311-manylinux_2_31_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp311-cp311-macosx_11_0_x86_64.whl (820.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ x86-64

iscc_core-1.0.8-cp310-cp310-win_amd64.whl (634.3 kB view details)

Uploaded CPython 3.10Windows x86-64

iscc_core-1.0.8-cp310-cp310-manylinux_2_31_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp310-cp310-macosx_11_0_x86_64.whl (644.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

iscc_core-1.0.8-cp39-cp39-win_amd64.whl (634.3 kB view details)

Uploaded CPython 3.9Windows x86-64

iscc_core-1.0.8-cp39-cp39-manylinux_2_31_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp39-cp39-macosx_11_0_x86_64.whl (644.6 kB view details)

Uploaded CPython 3.9macOS 11.0+ x86-64

iscc_core-1.0.8-cp38-cp38-win_amd64.whl (635.1 kB view details)

Uploaded CPython 3.8Windows x86-64

iscc_core-1.0.8-cp38-cp38-manylinux_2_31_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp38-cp38-macosx_11_0_x86_64.whl (645.0 kB view details)

Uploaded CPython 3.8macOS 11.0+ x86-64

iscc_core-1.0.8-cp37-cp37m-win_amd64.whl (635.6 kB view details)

Uploaded CPython 3.7mWindows x86-64

iscc_core-1.0.8-cp37-cp37m-manylinux_2_31_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.31+ x86-64

iscc_core-1.0.8-cp37-cp37m-macosx_11_0_x86_64.whl (646.2 kB view details)

Uploaded CPython 3.7mmacOS 11.0+ x86-64

File details

Details for the file iscc_core-1.0.8.tar.gz.

File metadata

  • Download URL: iscc_core-1.0.8.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8.tar.gz
Algorithm Hash digest
SHA256 d9c545a6a429e36ca5ddd27c304e325d78a29eaa0c94455d8ebe76a4edfbaef9
MD5 d90d98be5380cce0b5541f69e0d3f23f
BLAKE2b-256 8aace76171283bcbc2e79b971545deb22eb11b8f4e0c006eed987343fb0a90c2

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 635.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 dfc92eef64ccfde3987c46d3b48c9d716a6bd6cd9d915d361fc34b6e48364c25
MD5 f9025048150861392aa5824b1c0971aa
BLAKE2b-256 91bc0a9977b8fa4b578b80cf53e11df846eb3c5c01d1f656fa44f24cd334a0f8

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp312-cp312-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp312-cp312-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 a70d1d2ab091f1f6e7b870981c302213debe78c1e768fcd90e6ad4ae0395a705
MD5 e66dae8f4545ccb266421cd0879963b1
BLAKE2b-256 5f1c67ac3c12b072af7f838849f31cf00954fa1da4e7b395c10a471c00b045df

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 6fb718994bdbcebe7c0cdf6d4a9c9a78c553cd6e7002b5a2b76b9e4c26599d56
MD5 c17029e3f4beedbe22ee7135a46dfa8c
BLAKE2b-256 b9cf67701cc301fd7bebe0a1cdcb455bfa13e77f9fcd0dc1d4d2db59cc1e1118

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 634.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 29a0ba76fecf3606c8f820fc8331954e0de75a943404ae33e19c3b26818e2c40
MD5 955f99bbe1477a9b683473ee949ebf1f
BLAKE2b-256 1b2325baf25492bb903b199ce278673a1f6507dc6d78a6367b2dde827fbda1b9

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 e3549e1aafa7bcd13beb6dcd48557937ee2539bf74374635b32e1ac24d4a7f01
MD5 37201586a1f1b705615a3d0dc4e32fb0
BLAKE2b-256 7f1a63753f52921ed323318dc12410b9dfb39ed48235404ff756d90811571225

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 62e1120ca2532ef6469f81c1ae9ff49e0df8423052aca787669da5badc836c27
MD5 a9cd16a61eb16dadb7b7b10a629a1d3c
BLAKE2b-256 f9901c5179a997fb5d29d3cf82abd4c12a9e6ce240243d6d809bfcefe3dfa355

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 634.3 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5b941ebb777849aa25e8e0aa4c6d9b0e327fed877c7c6916aa1a2a34dea9451b
MD5 d2af59d0fe19a77c05b1a4fd0606cc7b
BLAKE2b-256 9cc51aa67c062d1d35bfd176aee632ec0b3b13ca1725b96ca30222721dbb53e4

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp310-cp310-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp310-cp310-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 d1c9278b039609bce7bec6f54aae662780c46ac18a0638bc74cb964d46404493
MD5 42669c9cc8a709cea2a284b524df6715
BLAKE2b-256 8acb73584c383565a300bfb83cea93c09f28e1de20f0a3892f623ff5b1856fb2

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 51a4f7a30c2536d8882c7b627566ce568905960372054e66e0b6f123b456f0e5
MD5 bc8c4abf965f2d97996ae18010da08f2
BLAKE2b-256 8439742a2ed9984fb78c250be80cdd56bd41c6c6957e0b9c429d0f15d647d7a1

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 634.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b034d7843c71beac9bf65d85fc33b7ddcadd0e540071446875a62393e705b077
MD5 66a6ebdb273aabcda9f826eae0acde68
BLAKE2b-256 636b95f9aaae545c37b3e01ae752a7943d097f400da483fbde24a129e42f6df4

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp39-cp39-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp39-cp39-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 6cd188d8fd889aea2efbc06022e0d57dddaec4ef912d98f7797c6bf4239bb6d4
MD5 8ad875744744ece74b9e26d90612b8b6
BLAKE2b-256 74284b89070b70d79a82c88e73e79f603188a1a2fb6deeb769d6e9eb819d0e16

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 6e30859ad1e6cef07340eff1a201b36dea1dd11bba621b15fa9d5a3655e3d487
MD5 0afe9345b670309c03b1d62096cefab0
BLAKE2b-256 841f496d24db351aa3fb438542e3622b8d69c49e0b9f58381afb3c13426639d5

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 635.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 1ac3c5a5bc89028c8d4eac4ccfd166150a06099ff967ffcc0e2cdb8ae3abb613
MD5 a8dfdc6823a184e5c07ae989f2e3d95a
BLAKE2b-256 aab723a44751ddfd81f63cf3838527d9556c1862cfd211103e381cf7d496f0e8

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp38-cp38-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp38-cp38-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 89d01610977edf1207cf5b2c429fab5d309bf82e17a1f2ed7fbe8b05694baf7a
MD5 761bd0c7b926ae9588b06da9be96a1d1
BLAKE2b-256 a358c8db1d4ad97eb2f767165ef8b3d95357f761afc4a613f31988fc754b0019

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp38-cp38-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 e6482c12fdf53998fe613ea26de6ea2d9ebd14442d03b8e74a4c8e5a4291ba3d
MD5 38bb82ff04be7059aeb22bbfa47c0cc6
BLAKE2b-256 ce16fe5a1993a67756f5244f9dbc44f0a91c84eb771c35eacfa67a01bc937a58

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.8-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 635.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.1 Windows/10

File hashes

Hashes for iscc_core-1.0.8-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8ee3a4090cd85180782017927ed7fe8e95c7f34f0eded73a7091d9eee635c98e
MD5 65a54b0bd45ef1b2dcd118977dd2817b
BLAKE2b-256 a17b4d3324ffdb8a182a8c260c4bd02cda47cc70bea7eb846cb94eb32af97c44

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp37-cp37m-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp37-cp37m-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 b319b358899d0f6a2f9765fb84cace5fe12b80c726bc3de1fff30978e06a98f4
MD5 416d8f67ad1cecc5c43bcb22707155e8
BLAKE2b-256 133adb35108da0786e3611664790349a42a5023a375c81e806238a9e53bb066a

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.8-cp37-cp37m-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.8-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 5ff275be12e30aeaeb9612c53a525ed5512fa6c3eb1890f715463898f4c91e68
MD5 f97c1197a0d0ce3e80494a3aa53581e1
BLAKE2b-256 22c11ceb304e3af7c6c4b45743bb53073774c67071bc89c53e081c8b54cc3b53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page