Software Heritage Content Indexer
Project description
swh-indexer
Tools to compute multiple indexes on SWH's raw contents:
- content:
- mimetype
- ctags
- language
- fossology-license
- metadata
- revision:
- metadata
An indexer is in charge of:
- looking up objects
- extracting information from those objects
- store those information in the swh-indexer db
There are multiple indexers working on different object types:
- content indexer: works with content sha1 hashes
- revision indexer: works with revision sha1 hashes
- origin indexer: works with origin identifiers
Indexation procedure:
- receive batch of ids
- retrieve the associated data depending on object type
- compute for that object some index
- store the result to swh's storage
Current content indexers:
-
mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype
-
language (queue swh_indexer_content_language): detect the programming language
-
ctags (queue swh_indexer_content_ctags): compute tags information
-
fossology-license (queue swh_indexer_fossology_license): compute the license
-
metadata: translate file into translated_metadata dict
Current revision indexers:
- metadata: detects files containing metadata and retrieves translated_metadata in content_metadata table in storage or run content indexer to translate files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
swh.indexer-0.0.142.tar.gz
(91.9 kB
view hashes)
Built Distribution
swh.indexer-0.0.142-py3-none-any.whl
(125.9 kB
view hashes)
Close
Hashes for swh.indexer-0.0.142-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d1b1ac5740699a29b3816fa5ced15b7af5bae56fe7b2f1a4a4ee17b950d0c1 |
|
MD5 | 4c3d011fc5e2cd6dfeb89e8258b2f006 |
|
BLAKE2b-256 | 0e12ebcd41dcb0c6ad68d2ba443e82c695e6a5179e051d924696cbcb5eaeec7f |