Skip to main content

Software Heritage Content Indexer

Project description

swh-indexer

Tools to compute multiple indexes on SWH's raw contents:

  • content:
    • mimetype
    • fossology-license
    • metadata
  • origin:
    • metadata (intrinsic, using the content indexer; and extrinsic)

An indexer is in charge of:

  • looking up objects
  • extracting information from those objects
  • store those information in the swh-indexer db

There are multiple indexers working on different object types:

  • content indexer: works with content sha1 hashes
  • revision indexer: works with revision sha1 hashes
  • origin indexer: works with origin identifiers

Indexation procedure:

  • receive batch of ids
  • retrieve the associated data depending on object type
  • compute for that object some index
  • store the result to swh's storage

Current content indexers:

  • mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype

  • fossology-license (queue swh_indexer_fossology_license): compute the license

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta vocabulary)

Current origin indexers:

  • metadata: translate file from an ecosystem-specific formats to JSON-LD (using schema.org/CodeMeta and ForgeFed vocabularies)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.indexer-1.9.4.tar.gz (150.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.indexer-1.9.4-py3-none-any.whl (197.8 kB view details)

Uploaded Python 3

File details

Details for the file swh.indexer-1.9.4.tar.gz.

File metadata

  • Download URL: swh.indexer-1.9.4.tar.gz
  • Upload date:
  • Size: 150.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.3

File hashes

Hashes for swh.indexer-1.9.4.tar.gz
Algorithm Hash digest
SHA256 75a63c9016320c449fdc18cf1fb76ab1ca63afb6dd4575c33cf15ccb76850b8f
MD5 180b7f781d42c53bba660713802e7b4a
BLAKE2b-256 9ad7012eac22d6f608cc6ff055263cbcc45e3ba41af451acb8aa76ba138a710e

See more details on using hashes here.

File details

Details for the file swh.indexer-1.9.4-py3-none-any.whl.

File metadata

  • Download URL: swh.indexer-1.9.4-py3-none-any.whl
  • Upload date:
  • Size: 197.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.3

File hashes

Hashes for swh.indexer-1.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f379a49c14fa7d5f7d800b939206df1ac3bc5c4aa24e141a620df2f7b3115117
MD5 8d338e9e6d6b66abf95b1921ddf04531
BLAKE2b-256 9d3557c8f63c0749926eeb2d642493cd8700fbdf81d625acda0ace69628885e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page