Skip to main content

A verification framework to map PURLs to verified SWHIDs

Project description

SWHID Verification Tool

License: MIT Python 3.9+ Software Heritage

A verification framework designed to map Package URLs (PURLs) to verified Software Heritage Identifiers (SWHIDs). This tool ensures cryptographic and structural provenance by establishing a verifiable link between software distributions and their canonical source code archived in the Software Heritage (SWH) ecosystem.

The Semantic Gap

In modern software development, we interact with dependencies using package-level identifiers (e.g., lodash@4.17.21 or requests@2.31.0). However, these packages are mutable and vulnerable to supply chain tampering.

To guarantee reproducibility and security, we need cryptographic, content-addressed identifiers like Software Heritage Identifiers (SWHIDs). Currently, there is a semantic gap between the package managers and the archive. This tool bridges that gap by automatically resolving package releases to verified SWHIDs across 5 major registries: PyPI, npm, Cargo, Go Modules, and Maven Central.

📊 Showcase Dataset

We have generated a verified showcase dataset containing 25 of the most popular packages across all 5 ecosystems. The resulting SPDX 3.0 JSON-LD manifest is available at dataset/showcase_manifest.jsonld.

Verification Statistics

Metric Count Percentage
Total Packages 25 100%
Inferred (Medium Confidence) 18 72.0%
Verified (High Confidence) 1 4.0%
Partial (Low Confidence) 1 4.0%
Errors/Failed 5 20.0%

Note: The "Inferred" status indicates that the repository was successfully matched and verified in the Software Heritage archive, but the specific version tag was not found in the latest snapshot. Running the tool with a Software Heritage API token resolves rate-limiting errors (HTTP 429) encountered during "Save Code Now" triggers.

Key Features

  • Multi-Ecosystem Support: Specialized verification strategies for PyPI, Crates.io (Cargo), and Maven Central.
  • High-Confidence Provenance:
    • PyPI: Extraction of commit SHAs from Sigstore/PEP 740 attestations via Fulcio certificates.
    • Cargo: Deterministic normalization and restoration of original project state for byte-for-byte matching.
    • Maven: SCM metadata resolution and verification of cleaned source artifacts.
  • SPDX 3.0 Compliance: Generation of RDF-compatible JSON-LD manifests using official SPDX models.
  • Automated Archival Integration: Proactive use of the Software Heritage "Save Code Now" API.
  • Installation Verification: Local filesystem scanner to audit installed packages against verified SWHID ground truth.

Installation

Prerequisites

  • Python 3.9+
  • [Optional] A Software Heritage API Token for higher rate limits.

Setup

git clone https://github.com/OdysseasKalaitsidis/SWHID_POC
cd SWHID_POC
python -m venv venv
source venv/bin/activate  # Use .\venv\Scripts\activate on Windows
pip install -r requirements.txt

Configuration

The tool can be configured via environment variables or a .env file:

Variable Description Default
SWH_TOKEN Software Heritage API Authentication Token None
CACHE_DIR Directory for caching resolution results ./cache
LOG_LEVEL Logging verbosity (DEBUG, INFO, ERROR) INFO

Usage

Quick Start

Map a single PURL to a verified SWHID immediately:

python -m swhid_tool.cli swhid-map pkg:pypi/six@1.17.0

Batch Processing

Generate an SPDX 3.0 dataset for multiple PURLs:

python -m swhid_tool.cli batch-process input_purls.txt output_report.jsonld

Integrity Auditing

Verify a local directory against a verified manifest:

python -m swhid_tool.cli verify-path /path/to/installed/library manifest.jsonld

REST API

Deploy as a service using FastAPI:

python -m uvicorn swhid_tool.api:app --host 0.0.0.0 --port 8000

Architecture

The system utilizes a strategy-based pattern to decouple ecosystem-specific logic from the core resolution engine.

graph TD
    CLI[CLI / API] --> Manager[SWHID Manager]
    Manager --> PURL[PURL Parser]
    Manager --> StrategyRouter{Strategy Router}
    StrategyRouter --> PyPI[PyPI Strategy]
    StrategyRouter --> Cargo[Cargo Strategy]
    StrategyRouter --> Maven[Maven Strategy]
    PyPI --> SWH[SWH API / Archive]
    Cargo --> SWH
    Maven --> SWH
    Manager --> Exporter[SPDX 3.0 Exporter]
    Exporter --> JSONLD[JSON-LD Manifest]

Validation and Standards

Verification findings are exported as SPDX 3.0 documents. Compliance with RDF standards is ensured through SHACL shape validation using the integrated test_validation.py suite.

Documentation

Detailed guides for different stakeholders:

  • User Guide: CLI reference, API specifications, and troubleshooting.
  • Developer Guide: Extending the tool to new ecosystems and core internals.
  • Maintainer Guide: Best practices for enabling high-confidence verifiability.

Contributing

Contributions are welcome! Please see the Developer Guide for setup instructions and coding standards.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project was developed as part of the Google Summer of Code (GSoC) 2026 program, under the mentorship of Software Heritage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swhid_verification_tool-0.1.7.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swhid_verification_tool-0.1.7-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file swhid_verification_tool-0.1.7.tar.gz.

File metadata

  • Download URL: swhid_verification_tool-0.1.7.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for swhid_verification_tool-0.1.7.tar.gz
Algorithm Hash digest
SHA256 5da36c703a28b35030d85c7ebbf73ed6da9e8466acbb413e37e1117953c11f2d
MD5 e777515e38efa38a3bb53921a3d1c5d0
BLAKE2b-256 88ae18eed93e0a8905ce87a1e6fa8146769f78e90fb33b6a65337208f2412cef

See more details on using hashes here.

Provenance

The following attestation bundles were made for swhid_verification_tool-0.1.7.tar.gz:

Publisher: publish.yml on OdysseasKalaitsidis/swhid-verification-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file swhid_verification_tool-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for swhid_verification_tool-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e914d04474d6316a03d366bc9a3c39f83d72a7731d61f710dfb57e9f9cbc016b
MD5 cc1e882e5943c11f4de79b5428f15160
BLAKE2b-256 1e0dce20dfbb4d437ca3f9e086c9dc2b7a207e6f9e80a3fc67eb1c182252bc24

See more details on using hashes here.

Provenance

The following attestation bundles were made for swhid_verification_tool-0.1.7-py3-none-any.whl:

Publisher: publish.yml on OdysseasKalaitsidis/swhid-verification-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page