Skip to main content

Automatically detect software supply chain smells and issues

Project description

dirty-waters

Dirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively.

Reference: Dirty-Waters: Detecting Software Supply Chain Smells, Technical report 2410.16049, arXiv, 2024.

By using dirty-waters, you identify the shady areas of your supply chain, which would be natural target for attackers to exploit.

dirty-waters's static analyses report the following smells:

  • Dependencies with no/invalid link to source code repositories (high severity)
  • Dependencies with no tag/commit SHA for release, impossible to have reproducible builds (medium severity)
  • Deprecated Dependencies (medium severity)
  • Depends on a fork (medium severity)
  • Dependencies without/with invalid code signature (medium severity)
  • Dependencies with no build attestation (low severity)

As for its differential analyses, dirty-waters reports the following smells:

  • Dependencies with code signature changes (high severity)
  • Downgraded dependencies (medium severity)
  • Dependencies with commits made by both new authors and reviewers (medium severity)
  • Dependencies with commits approved by new reviewers (medium severity)
  • Dependencies with new contributors (low severity)

Additionally, dirty-waters gives a supplier view on the dependency trees (who owns the different dependencies?)

dirty-waters is developed as part of the Chains research project.

Installation

To set up dirty-waters, follow these steps:

  1. Clone the repository:
git clone https://github.com/chains-project/dirty-waters.git
cd dirty-waters
  1. Set up a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd tool

In alternative to virtual environments, you may also use the Nix flake present in this repository.

  1. Set up the GitHub API token (ideally, in a .env file):
export GITHUB_API_TOKEN=<your_token>

Usage

Command line

Run the tool using the following command structure:

usage: dirty-waters [-h] -p PROJECT_REPO_NAME -v RELEASE_VERSION_OLD
               [-vn RELEASE_VERSION_NEW] -s [-d] [-n] -pm
               {yarn-classic,yarn-berry,pnpm,npm,maven}
               [--pnpm-scope PNPM_SCOPE] [--debug] [--no-gradual-report]
               [--check-source-code] [--check-release-tags]
               [--check-deprecated] [--check-forks] [--check-provenance]
               [--check-code-signature]

options:
  -h, --help            show this help message and exit
  -p PROJECT_REPO_NAME, --project-repo-name PROJECT_REPO_NAME
                        Specify the project repository name. Example:
                        MetaMask/metamask-extension
  -v RELEASE_VERSION_OLD, --release-version-old RELEASE_VERSION_OLD
                        The old release tag of the project repository.
                        Example: v10.0.0
  -vn RELEASE_VERSION_NEW, --release-version-new RELEASE_VERSION_NEW
                        The new release version of the project repository.
  -s, --static-analysis
                        Run static analysis and generate a markdown report of
                        the project
  -d, --differential-analysis
                        Run differential analysis and generate a markdown
                        report of the project
  -n, --name-match      Compare the package names with the name in the in the
                        package.json file. This option will slow down the
                        execution time due to the API rate limit of code
                        search.
  -pm {yarn-classic,yarn-berry,pnpm,npm,maven}, --package-manager {yarn-classic,yarn-berry,pnpm,npm,maven}
                        The package manager used in the project.
  --pnpm-scope PNPM_SCOPE
                        Extract dependencies from pnpm with a specific scope
                        using 'pnpm list --filter <scope> --depth Infinity'
                        command. Configure the scope in tool_config.py file.
  --debug               Enable debug mode.
  --no-gradual-report   Disable gradual report generation -- instead of one
                        smell type per report, gradually descending by
                        severity, report all.

smell checks:
  --check-source-code   Check for dependencies with no link to source code
                        repositories
  --check-release-tags  Check for dependencies with no tag/commit sha for
                        release
  --check-deprecated    Check for deprecated dependencies
  --check-forks         Check for dependencies that are forks
  --check-provenance    Check for dependencies with no build attestation
  --check-code-signature
                        Check for dependencies with missing/invalid code
                        signature

Reports are gradual by default: that is, only the highest severity smell type with issues found within this project is reported. You can disable this feature, and get a full report, by setting the --no-gradual-report flag to true. Note that if you ask for specific checks to be performed, the gradual report feature will also be disabled.

  1. Static analysis:
# If manually cloned
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry
# If installed via pip
dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry
  1. Differential analysis:
# If manually cloned
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -s -d -pm yarn-berry
# If installed via pip
dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -s -d -pm yarn-berry

Notes:

  • -v should be the version of GitHub release, e.g. for this release, the value should be v11.11.0, not Version 11.11.0 or 11.11.0.
  • The -s flag is required for all analyses.
  • When using -d for differential analysis, both -v and -vn must be specified.

Continuous integration

See Github action at https://github.com/chains-project/dirty-waters-action

Software Supply Chain Smell Support

dirty-waters currently supports package managers within the JavaScript and Java ecosystems. However, due to some constraints associated with the nature of the package managers, the tool may not be able to detect all the smells in the project. The following table shows the supported package managers and their associated smells, for static analysis:

Package Manager No Source Code Repository Invalid Source Code Repository URL No Release Tag Deprecated Dependency Depends on a Fork No Build Attestation No/Invalid Code Signature
Yarn Classic Yes Yes Yes Yes Yes Yes Yes
Yarn Berry Yes Yes Yes Yes Yes Yes Yes
Pnpm Yes Yes Yes Yes Yes Yes Yes
Npm Yes Yes Yes Yes Yes Yes Yes
Maven Yes Yes Yes No Yes No Yes

All package managers support every smell in the differential analysis scenario.

Smell Check Options

By default, all supported checks for the given package manager are performed in static analysis. You can specify individual checks using the following flags (note that if at least one flag is passed, instead of all checks being performed, only the flagged ones will be):

  • --check-source-code: Check for dependencies with no link to source code repositories
  • --check-release-tags: Check for dependencies with no tag/commit sha for release
  • --check-deprecated: Check for deprecated dependencies
  • --check-forks: Check for dependencies that are forks
  • --check-provenance: Check for dependencies with no build attestation
  • --check-code-signature: Check for dependencies with no/invalid code signature

Note: The --check-release-tags and --check-forks flags require --check-source-code to be enabled, as release tags can only be checked if we can first verify the source code repository.

As an example of running specific checks:

dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry --check-source-code --check-release-tags

This run will only check for dependencies with no link to source code repositories and dependencies with no tag/commit sha for release.

For differential analysis, it is currently not possible to specify individual checks -- all checks will be performed.

Notes

Inaccessible Tags

Sometimes, the release version specified in a lockfile/pom/similar is not necessarily the same as the tag used in the repository. This can happen for a variety of reasons. We have compiled several tag formats which were deemed reasonable to lookup, if the exact tag specified in the lockfile/pom/similar is not found. They come from a combination of AROMA's work and our own research on this subject. These formats are the following:

Tag formats
  • <tag>
  • v<tag>
  • r-<tag>
  • release-<tag>
  • parent-<tag>
  • <package_name>@<tag>
  • <package_name>-v<tag>
  • <package_name>_v<tag>
  • <package_name>-<tag>
  • <package_name>_<tag>
  • <repo_name>@<tag>
  • <repo_name>-v<tag>
  • <repo_name>_v<tag>
  • <repo_name>-<tag>
  • <repo_name>_<tag>
  • <project_name>@<tag>
  • <project_name>-v<tag>
  • <project_name>_v<tag>
  • <project_name>-<tag>
  • <project_name>_<tag>
  • release/<tag>
  • <tag>-release
  • v.<tag>
  • p1-p2-p3<tag>

As examples of what package_name, repo_name, and project_name could be, maven-surefire is an interesting dependency:

  • maven-surefire-common is the package name
  • maven-surefire is the repo name (we remove the owner prefix)
  • surefire is the project name

In particular, there are many maven-* dependencies whose tags follow these last conventions.

Note than this does not mean that if dirty-waters does not find a tag, it doesn't exist: it means that it either doesn't exist, or that its format is not one of the above.

This list may be expanded in the future. If you feel that a relevant format is missing, please open an issue and/or a pull request!

Academic Work

Other issues not handled by dirty-waters

  • Missing dependencies: simply run mvn/pip/... install :)
  • Bloated dependencies: we recommend DepClean for Java, depcheck for NPM
  • Version constraint inconsistencies: we recommend pipdeptree for Python

License

MIT License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirty_waters-0.46.0.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dirty_waters-0.46.0-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file dirty_waters-0.46.0.tar.gz.

File metadata

  • Download URL: dirty_waters-0.46.0.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.0 CPython/3.12.3

File hashes

Hashes for dirty_waters-0.46.0.tar.gz
Algorithm Hash digest
SHA256 e245d0e7b4260b33139cb1eb21b2d85c10ab4a247bf038dc0e85cf34374ef3b8
MD5 f4f7c1180d8b986431b0de699a8e9bdb
BLAKE2b-256 36329753ee8849b90b77b6d8adf38151e75eb1528ed85cacc7bb4c3a29ce94e1

See more details on using hashes here.

File details

Details for the file dirty_waters-0.46.0-py3-none-any.whl.

File metadata

  • Download URL: dirty_waters-0.46.0-py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.0 CPython/3.12.3

File hashes

Hashes for dirty_waters-0.46.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e6a894afbcb199cd7ef7f9cb2ec0c9a6eb0938c23ebbc47c4a66392e857eabb
MD5 d741be39958a6a43ff78d8fb8fcfc550
BLAKE2b-256 9ef0bf2cc1a42b79c06f41af838017105bf7ca193a122930d814fbddc0c575ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page