Skip to main content

Software Heritage code scanner

Project description

Source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage.

Getting Started

Installation

To install the Software Heritage scanner, run:

pip install swh-scanner

Note that it will install swh-scanner and its dependencies in the current virtualenv (if any). If you just want to install the scanner as a standalone tool, you may want to use an installation tool like pipx or uv:

$ uv tool install --with swh-scanner swh-core

or

$ pipx install --include-deps swh-scanner

Registering to the Software Heritage Archive

To efficiently query the Software Heritage Archive, you need to create an account. This is not strictly necessary, but the rate limit imposed on anonymous users will likely result in very slow operation.

First, visit https://archive.softwareheritage.org/oidc/login/ and create a new user by clicking on Register.

Configuring your scan

The scanner will guide you through your initial configuration through the setup command, including setting up your authentication token:

swh scanner setup

Running a Scan

To scan your local file in PROJECT_PATH, use:

swh scanner scan PROJECT_PATH

This will find your local files, query the archive, and provide you with a graphical user interface to browse the result.

Note that the scan command has a --provenance flag that retrieves information about where the files known to the archive might come from. This option is experimental and you need to get in touch with the Software Heritage team to be granted permission to the necessary APIs. Alternatively, there is a button in the dashboard that will query the provenance for a given selected file or directory. This is also experimental and gated to privileged users.

Further Configuration

The scanner will add up configuration options from three places, in order of precedence:

  • The command line

  • The project config file

  • The global config file

You can view the command line options by invoking swh scanner scan --help.

The scanner will look for a swh.scanner.project.yml file inside the directory being scanned, or at the path given to --project-config-file.

The global configuration resides in the swh > scanner section of the shared YAML configuration file used by all Software Heritage tools, located by default at ~/.config/swh/global.yml.

The configuration file location is subject to the XDG Base Directory specification as well as explicitly overridden on the command line via the -C/--config-file flag.

The following sub-sections and fields can be used within the swh > scanner stanza:

  • disable_global_patterns (default: false): whether to disable the global exclusion patterns, which refer to very common patterns of files to exclude from the scan. Only use this if you’re finding that some files are being ignored that you would want to scan, though very unlikely.

  • disable_vcs_patterns (default: false): whether to stop using the ignore mechanisms from version control systems (.gitignore, .hgignore, .svnignore). Note that this ignore mechanism only works in the first place if the VCS is available in your PATH (Git, Mercurial or SVN).

  • exclude: (default: []): a list of glob patterns of paths to exclude from the scan, to use on top of all other exclusion patterns.

  • exclude_templates: (default: []): a list of names of exclusion templates (as listed in the scanner’s help) to use on top of all other exclusion patterns. This is useful if you want to exclude all common Python cache files for example.

Here is an example:

scanner:
  disable_global_patterns: false
  disable_vcs_patterns: false
  exclude: ["ignored*", "someotherpattern"]
  exclude_templates: ["Python", "Go", "Rust", "Node"]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh_scanner-0.8.3.tar.gz (206.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.scanner-0.8.3-py3-none-any.whl (261.2 kB view details)

Uploaded Python 3

File details

Details for the file swh_scanner-0.8.3.tar.gz.

File metadata

  • Download URL: swh_scanner-0.8.3.tar.gz
  • Upload date:
  • Size: 206.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for swh_scanner-0.8.3.tar.gz
Algorithm Hash digest
SHA256 d0b09abaa243207203d0766fb4f2b7b6cb47ed7d4c316f2b95876bfbb5c36e44
MD5 659909c51e856380927cd7fb0cccce38
BLAKE2b-256 738d3946090b346a0cf41abb97bd5aed3de223206fbbdb25f48319a5e2a93c11

See more details on using hashes here.

File details

Details for the file swh.scanner-0.8.3-py3-none-any.whl.

File metadata

  • Download URL: swh.scanner-0.8.3-py3-none-any.whl
  • Upload date:
  • Size: 261.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for swh.scanner-0.8.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d13abac078d995e6dd15c3ff848180066b111bc678b932f67e1c997a1f4ca65a
MD5 1e076805fdbca235bd4c4acb6a745099
BLAKE2b-256 5eebcf6b28fc7f07758c2c7c053a0ee664021c3a67020e0ad6bcecc2947e6bf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page