Automatically detect software supply chain smells and issues
Project description
dirty-waters
Dirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively.
Reference: Dirty-Waters: Detecting Software Supply Chain Smells, Technical report 2410.16049, arXiv, 2024.
By using dirty-waters, you identify the shady areas of your supply chain, which would be natural target for attackers to exploit.
dirty-waters's static analyses report the following smells:
- Dependencies with no/invalid* link to source code repositories (high severity)
- Dependencies with no tag/commit SHA for release, impossible to have reproducible builds (medium severity)
- Deprecated Dependencies (medium severity)
- Depends on a fork (low severity), disabled by default
- Dependencies without/with invalid code signature (medium severity)
- Dependencies with no build attestation (low severity)
* We consider invalid links to be links which do not return a 200 status code. Furthermore, if the dependencies are not hosted on GitHub, not all checks will be possible to be made (e.g., code signature).
As for its differential analyses, dirty-waters reports the following smells:
- Dependencies with code signature changes (high severity)
- Downgraded dependencies (medium severity)
- Dependencies with commits made by both new authors and reviewers (medium severity)
- Dependencies with commits approved by new reviewers (medium severity)
- Dependencies with new contributors (low severity)
Additionally, dirty-waters gives a supplier view on the dependency trees (who owns the different dependencies?)
dirty-waters is developed as part of the Chains research project.
Installation
Installation via pip
You can install dirty-waters via pip:
pip install dirty-waters
# or
pipx install dirty-waters
Set up the GitHub API token (or with a .env file):
export GITHUB_API_TOKEN=<your_token>
Usage
Command line
Run the tool using the following command structure:
# analyzing the software supply chain of Maven project INRIA/spoon
$ dirty-waters -p INRIA/spoon -pm maven
All configuration options
usage: main.py [-h] -p PROJECT_REPO_NAME [-v RELEASE_VERSION_OLD]
[-vn RELEASE_VERSION_NEW] [-d] [-n] -pm
{yarn-classic,yarn-berry,pnpm,npm,maven}
[--pnpm-scope PNPM_SCOPE] [--debug] [--config CONFIG]
[--gradual-report GRADUAL_REPORT | --no-gradual-report]
[--check-source-code] [--check-source-code-sha]
[--check-deprecated] [--check-forks] [--check-provenance]
[--check-code-signature] [--check-aliased-packages]
options:
-h, --help show this help message and exit
-p PROJECT_REPO_NAME, --project-repo-name PROJECT_REPO_NAME
Specify the project repository name. Example:
MetaMask/metamask-extension
-v RELEASE_VERSION_OLD, --release-version-old RELEASE_VERSION_OLD
The old release tag of the project repository.
Defaults to HEAD. Example: v10.0.0
-vn RELEASE_VERSION_NEW, --release-version-new RELEASE_VERSION_NEW
The new release version of the project repository.
-d, --differential-analysis
Run differential analysis and generate a markdown
report of the project
-n, --name-match Compare the package names with the name in the in the
package.json file. This option will slow down the
execution time due to the API rate limit of code
search.
-pm {yarn-classic,yarn-berry,pnpm,npm,maven}, --package-manager {yarn-classic,yarn-berry,pnpm,npm,maven}
The package manager used in the project.
--pnpm-scope PNPM_SCOPE
Extract dependencies from pnpm with a specific scope
using 'pnpm list --filter <scope> --depth Infinity'
command. Configure the scope in tool_config.py file.
--debug Enable debug mode.
--config CONFIG Path to configuration file (JSON)
--gradual-report GRADUAL_REPORT
Enable/disable gradual reporting (default: true)
--no-gradual-report Disable gradual reporting (deprecated, use --gradual-
report=false instead)
smell checks:
--check-source-code Check for dependencies with no link to source code
repositories
--check-source-code-sha
Check for dependencies with no commit sha/tag for
release
--check-deprecated Check for deprecated dependencies
--check-forks Check for dependencies that are forks
--check-provenance Check for dependencies with no build attestation
--check-code-signature
Check for dependencies with missing/invalid code
signature
--check-aliased-packages
Check for aliased packages
Reports are gradual by default: that is, only the highest severity smell type with issues found within this project is reported. You can disable this feature, and get a full report, by setting the --gradual-report flag to false.
- Static analysis:
# If manually cloned
python3 main.py -p MetaMask/metamask-extension -pm yarn-berry
# If installed via pip
dirty-waters -p MetaMask/metamask-extension -pm yarn-berry
- Example output: Static Analysis Report Example
- Differential analysis:
# If manually cloned
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -d -pm yarn-berry
# If installed via pip
dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -d -pm yarn-berry
- Example output: Differential Analysis Report Example
Notes:
-vshould be the version of GitHub release, e.g. for this release, the value should bev11.11.0, notVersion 11.11.0or11.11.0.- When using
-dfor differential analysis,-vnmust be specified.
Development
To set up dirty-waters, follow these steps:
- Clone the repository:
git clone https://github.com/chains-project/dirty-waters.git
cd dirty-waters
- Set up a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd tool
In alternative to virtual environments, you may also use the Nix flake present in this repository.
- Set up the GitHub API token (ideally, in a
.envfile):
export GITHUB_API_TOKEN=<your_token>
Configuration
You can set the tool's configuration through a JSON file, which can be then passed to the tool using the --config flag.
At the moment, we have configuration support to ignore smells for specific dependencies.
The dependencies can be set either as an exact match or as a regex pattern.
You can either set "all" to ignore every check for the dependency or specify the checks you want to ignore.
An example configuration file:
{
"ignore": {
"shescape@2.1.0": "all",
"@types*": ["forks"]
}
}
Note that for cases where a package is aliased, we check for the original package name, not the aliased one:
i.e., if we alias the package string-width to string-width-cjs, we will check for string-width@versionx.y.z, not string-width-cjs@versionx.y.z.
Continuous integration
See Github action at https://github.com/chains-project/dirty-waters-action
Software Supply Chain Smell Support
dirty-waters currently supports package managers within the JavaScript and Java ecosystems. However, due to some constraints associated with the nature of the package managers, the tool may not be able to detect all the smells in the project. The following table shows the supported package managers and their associated smells, for static analysis:
| Package Manager | No Source Code Repository | Invalid Source Code Repository URL | No SHA/Release Tag | Deprecated Dependency | Depends on a Fork | No Build Attestation | No/Invalid Code Signature | Aliased Packages |
|---|---|---|---|---|---|---|---|---|
| Yarn Classic | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Yarn Berry | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Pnpm | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No |
| Npm | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Maven | Yes | Yes | Yes | No | Yes | No | Yes | No |
All package managers support every smell in the differential analysis scenario.
Smell Check Options
By default, all supported checks for the given package manager are performed in static analysis. You can specify individual checks using the following flags (note that if at least one flag is passed, instead of all checks being performed, only the flagged ones will be):
--check-source-code: Check for dependencies with no link to source code repositories--check-source-code-sha: Check for dependencies with no tag/commit sha for release--check-deprecated: Check for deprecated dependencies--check-forks: Check for dependencies that are forks--check-provenance: Check for dependencies with no build attestation--check-code-signature: Check for dependencies with no/invalid code signature
Note: The --check-source-code-sha and --check-forks flags require --check-source-code to be enabled, as release tags can only be checked if we can first verify the source code repository.
As an example of running specific checks:
dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -pm yarn-berry --check-source-code --check-source-code-sha
This run will only check for dependencies with no link to source code repositories and dependencies with no tag/commit sha for release.
For differential analysis, it is currently not possible to specify individual checks -- all checks will be performed.
Notes
Inaccessible Tags
Sometimes, the release version specified in a lockfile/pom/similar is not necessarily the same as the tag used in the repository. This can happen for a variety of reasons. We have compiled several tag formats which were deemed reasonable to lookup, if the exact tag specified in the lockfile/pom/similar is not found. They come from a combination of AROMA's work and our own research on this subject. These formats are the following:
Tag formats
<tag>v<tag>r-<tag>release-<tag>parent-<tag><package_name>@<tag><package_name>-v<tag><package_name>_v<tag><package_name>-<tag><package_name>_<tag><repo_name>@<tag><repo_name>-v<tag><repo_name>_v<tag><repo_name>-<tag><repo_name>_<tag><project_name>@<tag><project_name>-v<tag><project_name>_v<tag><project_name>-<tag><project_name>_<tag>release/<tag><tag>-releasev.<tag>p1-p2-p3<tag>
As examples of what package_name, repo_name, and project_name could be, maven-surefire
is an interesting dependency:
maven-surefire-commonis the package namemaven-surefireis the repo name (we remove the owner prefix)surefireis the project name
In particular, there are many maven-* dependencies whose tags follow these last conventions.
Note than this does not mean that if dirty-waters does not find a tag, it doesn't exist:
it means that it either doesn't exist, or that its format is not one of the above.
This list may be expanded in the future. If you feel that a relevant format is missing, please open an issue and/or a pull request!
Academic Work
Other issues not handled by dirty-waters
- Missing dependencies: simply run mvn/pip/... install :)
- Bloated dependencies: we recommend DepClean for Java, depcheck for NPM
- Version constraint inconsistencies: we recommend pipdeptree for Python
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dirty_waters-0.77.0.tar.gz.
File metadata
- Download URL: dirty_waters-0.77.0.tar.gz
- Upload date:
- Size: 50.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
540b103f8570ed36e503559fbbbc12446af1f5004c5635798ff8bd6ac12a194e
|
|
| MD5 |
fc1b5cc318d43b941dea626d7d43ecf0
|
|
| BLAKE2b-256 |
d442afeedaef98febc022299d16c8e7beceb47f159ca0907a4897f9321cf6b09
|
File details
Details for the file dirty_waters-0.77.0-py3-none-any.whl.
File metadata
- Download URL: dirty_waters-0.77.0-py3-none-any.whl
- Upload date:
- Size: 55.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b76310d4afb0d3b9899cb9694c6c453efe7789b137c1d2bab081ca0d740e388
|
|
| MD5 |
3c7b588df454722a4ac444d5d92db02f
|
|
| BLAKE2b-256 |
0cb3399b77354e6bff15168cbf58e3d756f6b7f0e273a82613d86fb987a2093b
|