Skip to main content

Software Heritage Content Indexer

Project description

swh-indexer

Tools to compute multiple indexes on SWH's raw contents:

  • content:
    • mimetype
    • ctags
    • language
    • fossology-license
    • metadata
  • revision:
    • metadata

An indexer is in charge of:

  • looking up objects
  • extracting information from those objects
  • store those information in the swh-indexer db

There are multiple indexers working on different object types:

  • content indexer: works with content sha1 hashes
  • revision indexer: works with revision sha1 hashes
  • origin indexer: works with origin identifiers

Indexation procedure:

  • receive batch of ids
  • retrieve the associated data depending on object type
  • compute for that object some index
  • store the result to swh's storage

Current content indexers:

  • mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype

  • language (queue swh_indexer_content_language): detect the programming language

  • ctags (queue swh_indexer_content_ctags): compute tags information

  • fossology-license (queue swh_indexer_fossology_license): compute the license

  • metadata: translate file into translated_metadata dict

Current revision indexers:

  • metadata: detects files containing metadata and retrieves translated_metadata in content_metadata table in storage or run content indexer to translate files.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.indexer-0.8.1.tar.gz (130.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.indexer-0.8.1-py3-none-any.whl (147.8 kB view details)

Uploaded Python 3

File details

Details for the file swh.indexer-0.8.1.tar.gz.

File metadata

  • Download URL: swh.indexer-0.8.1.tar.gz
  • Upload date:
  • Size: 130.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for swh.indexer-0.8.1.tar.gz
Algorithm Hash digest
SHA256 fa50ba2b0b059ce8931cbcaf93ce53c5138475879b972bc6ae22ddc289456a6f
MD5 db92db1bdef0b0cf1f06ca051c3c0ef9
BLAKE2b-256 479fa5cea719376fc6f7544279d6767f53bf0db7e1ac03c45afbc54afbe0d21e

See more details on using hashes here.

File details

Details for the file swh.indexer-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: swh.indexer-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 147.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.3

File hashes

Hashes for swh.indexer-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2700c3d4992dce314cde293a31bb3e31d80e52b8ccaff010044c180f4a998b91
MD5 1abb0e708f6fae01e379a02865697139
BLAKE2b-256 3cb46c62b65dcd59aa63dcbe9d6e53b6a8fdbf2c4453f4aefa0f647a480e5d02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page