Skip to main content

Software Heritage provenance

Project description

Software Heritage - Provenance

This service provide a provenance query service for the Software Heritage Archive. Provenance is the ability to ask for a given object stored in the Archive: “where does it come from?”

This question generally does not have a simple and unambiguous answer. It can be, among other:

  • what it the oldest revision in which this object has been found?

  • what is the “better” origin in which this object can be found?

Answering this kind of question requires querying the Merkle DAG on which the Software Heritage Archive is built with complex queries, mostly from the bottom to the top (aka from Content to Origin objects).

The idea is to use both the compressed graph representation of the Archive (swh-graph) and a preprocessed provenance index to speed up some of the provenance queries.

Description

The core feature of this tool is to provide a service to the reference to an object within the Software Heritage Archive where the queried object can be found.

There are mostly 2 kinds of provenance queries that can be done: - search for the best provenance answer from a given object; - search for all the possible provenance answers for a given object.

For each input object, the definition of “best provenance answer” is simple and unambiguous; for now, the best answer is the an origin in which the oldest revision (in the sense of the revision with the oldest commit date) in which this object has been found.

Provenance can be looked for:

  • Content

  • Directory

  • Revision

  • Release

For each object:

Input: SWHID (core SWHID of an artifact found in the user code base)

Output: SWHID or origin URI where input SWHID was found + context information
    Context information, a subset of:
        snapshot (snp SWHID)
        release (rel)
        revision (rev)
        path (filesystem-style path)

Non-functional requirements:
  - the returned object should be as high as possible; i.e. prefer an
    Origin (if any), then a Snapshot, then a Release, then a Revision,
  - the returned object should be the best possible answer, if possible;
    the definition of "best answer" being something like:

      *an* origin in which the oldest revision (in the sense of the
      revision with the oldest commit date) in which this object has been
      found.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_provenance-0.2.0.tar.gz (39.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.provenance-0.2.0-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file swh_provenance-0.2.0.tar.gz.

File metadata

  • Download URL: swh_provenance-0.2.0.tar.gz
  • Upload date:
  • Size: 39.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for swh_provenance-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9e2c8f3f7d73ae3f58769965435493d41141ae6b3d5ee1431e626ee552e98bd7
MD5 a68c8a1d8e8b41fd5640430a8eca98a3
BLAKE2b-256 a73e6b8e1a98a1dd95cfd76fbf74e701bb977b4e9e0472f154c7d4f0db48d042

See more details on using hashes here.

File details

Details for the file swh.provenance-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: swh.provenance-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for swh.provenance-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1090e37f1f6eaf008c5de59dd5511a27ba63d79be08eae178c51948ebbfaef1b
MD5 6726aa24cf8050ffcea884aa810231b5
BLAKE2b-256 8eaeac7293db937f43c7be1e1bc41025185375c19d17ef047a350571d60d1f3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page