Skip to main content

Software Heritage indexer

Project description

Tools to compute multiple indexes on SWH’s raw contents:

  • content:

    • mimetype

    • fossology-license

    • metadata

  • origin:

    • metadata (intrinsic, using the content indexer; and extrinsic)

An indexer is in charge of:

  • looking up objects

  • extracting information from those objects

  • store those information in the swh-indexer db

There are multiple indexers working on different object types:

  • content indexer: works with content sha1 hashes

  • revision indexer: works with revision sha1 hashes

  • origin indexer: works with origin identifiers

Indexation procedure:

  • receive batch of ids

  • retrieve the associated data depending on object type

  • compute for that object some index

  • store the result to swh’s storage

Current content indexers:

  • mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype

  • fossology-license (queue swh_indexer_fossology_license): compute the license

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta vocabulary)

Current origin indexers:

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta and ForgeFed vocabularies)

Project details


Release history Release notifications | RSS feed

This version

4.3.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_indexer-4.3.1.tar.gz (188.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh_indexer-4.3.1-py3-none-any.whl (232.3 kB view details)

Uploaded Python 3

File details

Details for the file swh_indexer-4.3.1.tar.gz.

File metadata

  • Download URL: swh_indexer-4.3.1.tar.gz
  • Upload date:
  • Size: 188.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for swh_indexer-4.3.1.tar.gz
Algorithm Hash digest
SHA256 7c1b3ed9db132bc4c3847304dc9278b7ebc9c3c5bdd7945a917431378b3724cd
MD5 f2203f025ce0f6cca8f57881df7aa663
BLAKE2b-256 8f71c017bd3f9a79621f900fdaa90277b9f1b16a89e448e7c015638a373619eb

See more details on using hashes here.

File details

Details for the file swh_indexer-4.3.1-py3-none-any.whl.

File metadata

  • Download URL: swh_indexer-4.3.1-py3-none-any.whl
  • Upload date:
  • Size: 232.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for swh_indexer-4.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f305b9c41101f2aecec68b76a6a5e44eecfe62d0053d4dabb57286b48d9ed2
MD5 4f4c4671d5367c7f273c785ad5ad4298
BLAKE2b-256 a1940d9b79bdede4cb8764cdae8abc247bfdbca7c718a7d4f3c951da0ef2258d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page