Skip to main content

Software Heritage indexer

Project description

Tools to compute multiple indexes on SWH’s raw contents:

  • content:

    • mimetype

    • fossology-license

    • metadata

  • origin:

    • metadata (intrinsic, using the content indexer; and extrinsic)

An indexer is in charge of:

  • looking up objects

  • extracting information from those objects

  • store those information in the swh-indexer db

There are multiple indexers working on different object types:

  • content indexer: works with content sha1 hashes

  • revision indexer: works with revision sha1 hashes

  • origin indexer: works with origin identifiers

Indexation procedure:

  • receive batch of ids

  • retrieve the associated data depending on object type

  • compute for that object some index

  • store the result to swh’s storage

Current content indexers:

  • mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype

  • fossology-license (queue swh_indexer_fossology_license): compute the license

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta vocabulary)

Current origin indexers:

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta and ForgeFed vocabularies)

Project details


Release history Release notifications | RSS feed

This version

3.6.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_indexer-3.6.0.tar.gz (153.2 kB view details)

Uploaded Source

Built Distribution

swh.indexer-3.6.0-py3-none-any.whl (195.8 kB view details)

Uploaded Python 3

File details

Details for the file swh_indexer-3.6.0.tar.gz.

File metadata

  • Download URL: swh_indexer-3.6.0.tar.gz
  • Upload date:
  • Size: 153.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for swh_indexer-3.6.0.tar.gz
Algorithm Hash digest
SHA256 1afb2a5af58b432de5583e6d5f75aba70082116d5be4d472c6a759a16b08e749
MD5 5c7aafa3d3003105bafb6629b656b6a6
BLAKE2b-256 f0d1a147d162d08bbc8111cfa12f779c0a0f9a675c780d4fdd479e6a187d0a40

See more details on using hashes here.

File details

Details for the file swh.indexer-3.6.0-py3-none-any.whl.

File metadata

  • Download URL: swh.indexer-3.6.0-py3-none-any.whl
  • Upload date:
  • Size: 195.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for swh.indexer-3.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38dbe5203f3362a58ef79d00e229cc0ddc42d562ba2f87fc333c115ec76f2335
MD5 74e42d6ac5f5cf9a7875b1a57c427a25
BLAKE2b-256 66777c9f62bee7745e11e0bbc2036256b18f9c86b36a38cffcca1a0caf617bf2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page