Skip to main content

ISCC - Core Algorithms

Project description

ISCC - Codec & Algorithms

Build Version Coverage Quality Downloads

iscc-core is the reference implementation of the core algorithms of the ISCC (International Standard Content Code)

What is the ISCC

The ISCC is a similarity preserving fingerprint and identifier for digital media assets.

ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash).

The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each component is self-describing, modular, and can be used separately or with others to aid in various content identification tasks. The algorithmic design supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and general digital asset management use-cases.

What is iscc-core

iscc-core is a python based reference library of the core algorithms to create standard-compliant ISCC codes. It also a good reference for porting ISCC to other programming languages.

!!! tip This is a low level reference implementation that does not inlcude features like mediatype detection, metadata extraction or file format specific content extraction. Please have a look at the iscc-sdk which adds those higher level features on top of the iscc-core library.

Project Status

The ISCC is under development as ISO/CD 24138 - International Standard Content Code within ISO/TC 46/SC 9/WG 18.

ISCC Architecture

ISCC Architecture

ISCC MainTypes

Idx Slug Bits Purpose
0 META 0000 Match on metadata similarity
1 SEMANTIC 0001 Match on semantic content similarity
2 CONTENT 0010 Match on perceptual content similarity
3 DATA 0011 Match on data similarity
4 INSTANCE 0100 Match on data identity
5 ISCC 0101 Composite of two or more components with common header

Installation

Use the package manager pip to install iscc-core.

pip install iscc-core

Quick Start

import json
import iscc_core as ic

meta_code = ic.gen_meta_code(name="ISCC Test Document!")

print(f"Meta-Code:     {meta_code['iscc']}")
print(f"Structure:     {ic.iscc_explain(meta_code['iscc'])}\n")

# Extract text from file
with open("demo.txt", "rt", encoding="utf-8") as stream:
    text = stream.read()
    text_code = ic.gen_text_code_v0(text)
    print(f"Text-Code:     {text_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(text_code['iscc'])}\n")

# Process raw bytes of textfile
with open("demo.txt", "rb") as stream:
    data_code = ic.gen_data_code(stream)
    print(f"Data-Code:     {data_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(data_code['iscc'])}\n")

    stream.seek(0)
    instance_code = ic.gen_instance_code(stream)
    print(f"Instance-Code: {instance_code['iscc']}")
    print(f"Structure:     {ic.iscc_explain(instance_code['iscc'])}\n")

# Combine ISCC-UNITs into ISCC-CODE
iscc_code = ic.gen_iscc_code(
    (meta_code["iscc"], text_code["iscc"], data_code["iscc"], instance_code["iscc"])
)

# Create convenience `Code` object from ISCC string
iscc_obj = ic.Code(iscc_code["iscc"])
print(f"ISCC-CODE:     {ic.iscc_normalize(iscc_obj.code)}")
print(f"Structure:     {iscc_obj.explain}")
print(f"Multiformat:   {iscc_obj.mf_base32}\n")

# Compare with changed ISCC-CODE:
new_dc, new_ic = ic.Code.rnd(mt=ic.MT.DATA), ic.Code.rnd(mt=ic.MT.INSTANCE)
new_iscc = ic.gen_iscc_code((meta_code["iscc"], text_code["iscc"], new_dc.uri, new_ic.uri))
print(f"Compare ISCC-CODES:\n{iscc_obj.uri}\n{new_iscc['iscc']}")
print(json.dumps(ic.iscc_compare(iscc_obj.code, new_iscc["iscc"]), indent=2))

The output of this example is as follows:

Meta-Code:     ISCC:AAAT4EBWK27737D2
Structure:     META-NONE-V0-64-3e103656bffdfc7a

Text-Code:     ISCC:EAAQMBEYQF6457DP
Structure:     CONTENT-TEXT-V0-64-060498817dcefc6f

Data-Code:     ISCC:GAA7UJMLDXHPPENG
Structure:     DATA-NONE-V0-64-fa258b1dcef791a6

Instance-Code: ISCC:IAA3Y7HR2FEZCU4N
Structure:     INSTANCE-NONE-V0-64-bc7cf1d14991538d

ISCC-CODE:     ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
Structure:     ISCC-TEXT-V0-MCDI-3e103656bffdfc7a060498817dcefc6ffa258b1dcef791a6bc7cf1d14991538d
Multiformat:   bzqavabj6ca3fnp757r5ambeyqf6457dp7isywhoo66i2npd46hiutektru

Compare ISCC-CODES:
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI
ISCC:KACT4EBWK27737D2AYCJRAL5Z36G7Y7HA2BMECKMVRBEQXR2BJOS6NA
{
  "meta_dist": 0,
  "content_dist": 0,
  "data_dist": 33,
  "instance_match": false
}

Documentation

Documentation is published athttps://core.iscc.codes

Development

Requirements

  • Python 3.7.2 or higher for code generation and static site building.
  • Poetry for installation and dependency management.

Development Setup

git clone https://github.com/iscc/iscc-core.git
cd iscc-core
poetry install

Development Tasks

Tests, coverage, code formatting and other tasks can be run with the poe command:

poe

Poe the Poet - A task runner that works well with poetry.
version 0.18.1

Result: No task specified.

USAGE
  poe [-h] [-v | -q] [--root PATH] [--ansi | --no-ansi] task [task arguments]

GLOBAL OPTIONS
  -h, --help     Show this help page and exit
  --version      Print the version and exit
  -v, --verbose  Increase command output (repeatable)
  -q, --quiet    Decrease command output (repeatable)
  -d, --dry-run  Print the task contents but don't actually run it
  --root PATH    Specify where to find the pyproject.toml
  --ansi         Force enable ANSI output
  --no-ansi      Force disable ANSI output
CONFIGURED TASKS
  gentests       Generate conformance test data
  format         Code style formating with black
  docs           Copy README.md to /docs
  format-md      Markdown formating with mdformat
  lf             Convert line endings to lf
  test           Run tests with coverage
  sec            Security check with bandit
  all

Use poe all to run all tasks before committing any changes.

Maintainers

@titusz

Contributing

Pull requests are welcome. For significant changes, please open an issue first to discuss your plans. Please make sure to update tests as appropriate.

You may also want join our developer chat on Telegram at https://t.me/iscc_dev.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iscc_core-1.0.3.tar.gz (58.3 kB view details)

Uploaded Source

Built Distributions

iscc_core-1.0.3-cp311-cp311-win_amd64.whl (424.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

iscc_core-1.0.3-cp311-cp311-manylinux_2_31_x86_64.whl (951.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.31+ x86-64

iscc_core-1.0.3-cp311-cp311-macosx_11_0_x86_64.whl (491.3 kB view details)

Uploaded CPython 3.11 macOS 11.0+ x86-64

iscc_core-1.0.3-cp310-cp310-win_amd64.whl (424.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

iscc_core-1.0.3-cp310-cp310-manylinux_2_31_x86_64.whl (923.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.31+ x86-64

iscc_core-1.0.3-cp310-cp310-macosx_11_0_x86_64.whl (384.2 kB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

iscc_core-1.0.3-cp39-cp39-win_amd64.whl (427.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

iscc_core-1.0.3-cp39-cp39-manylinux_2_31_x86_64.whl (933.1 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.31+ x86-64

iscc_core-1.0.3-cp39-cp39-macosx_11_0_x86_64.whl (384.9 kB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

iscc_core-1.0.3-cp38-cp38-win_amd64.whl (427.1 kB view details)

Uploaded CPython 3.8 Windows x86-64

iscc_core-1.0.3-cp38-cp38-manylinux_2_31_x86_64.whl (960.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.31+ x86-64

iscc_core-1.0.3-cp38-cp38-macosx_11_0_x86_64.whl (382.7 kB view details)

Uploaded CPython 3.8 macOS 11.0+ x86-64

iscc_core-1.0.3-cp37-cp37m-win_amd64.whl (425.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

iscc_core-1.0.3-cp37-cp37m-manylinux_2_31_x86_64.whl (884.3 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.31+ x86-64

iscc_core-1.0.3-cp37-cp37m-macosx_11_0_x86_64.whl (383.0 kB view details)

Uploaded CPython 3.7m macOS 11.0+ x86-64

File details

Details for the file iscc_core-1.0.3.tar.gz.

File metadata

  • Download URL: iscc_core-1.0.3.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for iscc_core-1.0.3.tar.gz
Algorithm Hash digest
SHA256 d1bb45822d4e8e44664f421daa0252effeaec2865a95c27ce0a0fd9987d7f705
MD5 738733169cf17940cb1cf6f18b16f3dc
BLAKE2b-256 121e8c446d3eb802965704424535c0a8bbd7cd67300108617d6dd9236a75be91

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 95e8cad7387db838b5cc40125d40d77cf1b380fd6ebdef437ba3bbfe79f5aa68
MD5 bb5145da8f8a02a9e17252462c6700e6
BLAKE2b-256 92e307bd2c0fe0f87f99471b8cb08034957d0fe3e920db197495adb586726def

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp311-cp311-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp311-cp311-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 920f6f242e64ec869e680b5a8f1169d8e566963c9da51a947c99ade924ed8d9e
MD5 bbe65f9b42f2341124e19edae69915b7
BLAKE2b-256 71dafa6c6e63b0a781b90f224387f416296b97d187b7d5df2e7a37683dd9acc9

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 3324c228a7a9a386e3e3c4c6d24b4aa58896b1fc94fbaa1dc328f19a86beb3a3
MD5 b1f29a550106b93ed6869360f55ffbb9
BLAKE2b-256 f84a07f0ee6610ada71b2cb764d57a8c6b3236fc9645ef7827177b80ac1f6eea

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d8dee13cda9f15f1643b1c6e8cfa675258d0193b305bb9e5a4744ccbd4470ade
MD5 21dbe4390fb4e645afa182cc39086435
BLAKE2b-256 ab85ebd0d56c1f460b5740cf91daf532d726ce8a156e1a68b1db868385b2afa4

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp310-cp310-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp310-cp310-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 ed681626400221c4000f337665d89b6a53c039d1fe70990241b3e8032a0eddf5
MD5 2d6a9f0e09f5477eaeb23be3388f7fd8
BLAKE2b-256 242717157098d87c6017f3f6ba4a7f98c31faff20b96697c980769741d15723e

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 60b01e6f2e56afeca19df181859aee52cf0ab9c55150d6a019638cce26cc66dd
MD5 6d4098d6ff9e09fc2207b4d54961fa94
BLAKE2b-256 50b68dcd550d3550911a3c21d6b9c92c74ce3b9da08fa5f6d82cce1adfa3e4ca

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 427.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for iscc_core-1.0.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c72341a5b54e873bedfc714d7813f17f2568c88179c79e901725ae41ca64fbb5
MD5 a1eec99301cb3b3a3f0d47fc20888555
BLAKE2b-256 0da5ddf7901ead9547b744294a93aaa248928dac9d668219e41f6974aefb6b21

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp39-cp39-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp39-cp39-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 ea36095768a7da62368ca12b8a9df78b271f6755f9b8685e8beec9259e3aae5f
MD5 20d8ff5aa1971eccc23a39425ae0c06d
BLAKE2b-256 9dfa434ca70471df2b2487e3ece11b21d01e097a052f9d9727a20365cfe3103f

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 3624ead122f434ac0990acc76252edc5509a744fa46fb96d9f3fd9a10486ecaf
MD5 e253e24cdd79d518f7a8c23af18810fe
BLAKE2b-256 a11714ab5e2cb0e2aa5069680b258cfe416c904c548a74327aebd8bf2794a7fb

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 427.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for iscc_core-1.0.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c8f5c24481a549cc98fa8b3076464a886d5a854b9c3c4e4ecb20439f38145ea9
MD5 6b616f3dac3b457c2b1d117719aaa77b
BLAKE2b-256 0ce67ffc30c30b26bd8405a88da2dd4f0d136e3efa8a42e63200d239cd2b33d9

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp38-cp38-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp38-cp38-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 42ae58b4a060b92853ff6f9965a04fec7806f9f0d6251b3c34cf336fa7be1bb3
MD5 3f7d0739a699da13ff258348ae5e249e
BLAKE2b-256 6f67da3fce8aca6d319b7fed5fbe9f513ee37ce3b3cf0053fb296f8ce21d16fa

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp38-cp38-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 81a14c05715eb18d01824393a515f4028880d3a5052d9880fbd4d40a33046a79
MD5 cef7671b84e3c051e434a1ddddeea6c1
BLAKE2b-256 a26cae873ff8cf140aa668988ffff18010a66ebe2fc0263fdd4229cde67f24cd

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: iscc_core-1.0.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 425.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for iscc_core-1.0.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2ea64d71194841aad488c40cb91cc81b0262036b3adf209f6809bf578c618aa9
MD5 dba7e403b13ba24e92605f5b31e8065c
BLAKE2b-256 16e509e5412971eaf36bd86356e32ebedeef1e0fbf379db7a7e1fbb41a687c56

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp37-cp37m-manylinux_2_31_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp37-cp37m-manylinux_2_31_x86_64.whl
Algorithm Hash digest
SHA256 aa58936d781af9aae0dcb2893f73d69ea35067b3a432267c1f1f35e3055fcf79
MD5 a1cb623e9656fed584a3a014bf5a9994
BLAKE2b-256 bc56ea510375e13868bc6dd93d130d05769d9864e27fda715739ed5c8e511f6f

See more details on using hashes here.

File details

Details for the file iscc_core-1.0.3-cp37-cp37m-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for iscc_core-1.0.3-cp37-cp37m-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 7b763a30de8535604b9127761d266d76ac5e2f82664f8c3f81e1ec49068c62b5
MD5 176ea24edb389787026c1ecda011d0ad
BLAKE2b-256 a7155acd76a30c9b7affe2ff09a403e77e5393f5717385056bb33082b1a7bb0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page