Skip to main content

Choppr is a plugin that is meant to reduce the size of a software's Software Bill of Materials (SBOM).

Project description

Choppr

Choppr is a CLI tool to filter unused components out of an SBOM using strace results.

Choppr refines the components in a Software Bill of Materials (SBOM). It does not replace SBOM generation tools. Mainly, Choppr analyses a build or runtime to verify which components are used, and remove the SBOM components not used. Starting with file accesses, it works backwards from how an SBOM generation tool typically would. For example SBOM generators use the yum database to determine which packages yum installed. Choppr looks at all the files accessed and queries sources like yum to determine the originating package.

Other intended results include:

  • Reducing installed components. Size is optimized. The number of vulnerabilities is reduced. The less tools available to an attacker the better.
  • Creating a runtime container from the build container
  • Detecting files without corresponding SBOM components

Approaches

How to use Choppr depends on your project and needs. Consider the following use cases and their recommended approaches.

Build an SBOM of a software product

The user provides the required content. Choppr determines which components were used during the build. The exclude list tells Choppr to remove components like CMake, because the user is certain no CMake software was built into their product. An list of unused packages is generated that can be used to automate removal. Building again after removing these components verifies no required components were lost.

Create a runtime image and runtime SBOM from a build image

Choppr uses a multistage build to ADD the files used. Optionally metadata such as the yum database can be kept. The additional include list can be used to specify dynamically linked libraries, necessary services, or any other necessary components that were not exercised during build. This will also be reflected in the SBOM components.

Create a runtime SBOM from a runtime image

Similar to analyzing a build, Choppr can analyze a runtime.

If this is used to describe a delivery, it should be merged with the Build SBOM.


References:

Installation

pip install choppr

Usage

Usage: choppr [OPTIONS] OPERATING_MODE:{run|cache}

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    operating_mode      OPERATING_MODE:{run|cache}  The operating mode to use [required]                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --config          -f      FILE  The configuration file to use [default: choppr.yml]                                  │
│ --input-sbom      -i      FILE  The SBOM to process and filter the components of                                     │
│ --strace-results  -s      FILE  The output file created when running strace on your build or runtime executable      │
│ --output-sbom     -o      FILE  The file to write the chopped SBOM to                                                │
│ --log             -l      FILE  The log file to write to [default: choppr.log]                                       │
│ --verbose         -v            Enable debug logging                                                                 │
│ --version                                                                                                            │
│ --help                          Show this message and exit.                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Configuration

The default path choppr will look for the configuration is choppr.yml in the current working directory.

Example Configuration
---

input_files:
  sbom: ffmpeg.cdx.json
  strace_results: ffmpeg-strace.txt

repositories:
  rpm:
    - url: https://rocky-linux-us-west4.production.gcp.mirrors.ctrliq.cloud/pub/rocky/8.10/AppStream/x86_64/os/
    - url: http://mirror.siena.edu/rocky/8.10/BaseOS/x86_64/os/
    - url: https://mirrors.iu13.net/rocky/8.10/extras/x86_64/os/

options:
  strace_regex_excludes:
    - ^.*ffmpeg.*$
    - ^.*\.(c|cpp|cxx|h|hpp|o|py|s)$
    - ^/usr/share/pkgconfig$
    - ^/tmp$
    - ^bin$
    - ^.*\.git.*$
    - ^.*(\.\.)+.*$
    - ^.*(CMakeFiles.*|\.cmake)$

Variables

input_files:
  sbom: Path
  strace_results: Path | None
  cache_archive: Path | None

repositories:  # dict[PurlType, list[Repository]]
  _purl_type_:  # PurlType
    - url: HttpUrl | None
      urls:  # list[HttpUrl] | None
        - ...
      credentials:  # Credentials | None
        username: str | None
        user_env: str | None
        pass_env: str | None
      certificate: Path | None
    ...
  deb: # Debian repositories have extra configuration beyond the standard configuration above
    - url: ...
      credentials: ...
      certificate: ...
      distributions:
        - name: str
          components:  # list[str]
            - main
            - security

output_files:  # OutputFiles | None
  cache_archive: Path | None
  excluded_components: # dict[PurlType, ExcludedPackageFile] | None
    _purl_type_: # PurlType
      file: Path
      component_format: str | None
    ...
  sbom: Path | None

options:
  allow_partial_filename_match: bool
  allow_version_mismatch: bool
  allowlist:  # dict[PurlType, PackagePattern]
    _purl_type_:  # PurlType
      - name: regex
        version: regex
  archive_cache: bool
  cache_dir: Path
  cache_timeout: timedelta | bool
  clear_cache: bool
  delete_excluded: bool
  denylist:  # dict[PurlType, PackagePattern]
    _purl_type_:  # PurlType
      - name: regex
        version: regex
  http_limits:
    retries: PositiveInt
    retry_interval: PositiveFloat
    timeout: PositiveFloat
  keep_essential_os_components: bool
  recursion_limit: PositiveInt
  sort_sbom: bool
  strace_regex_excludes: list[regex]

Common Types

PurlType

The purl type, as defined in the package URL specification.

The list of available options can be found here.

Type: str

Input Files

sbom

The SBOM to process and filter the components of with Choppr.

This file is expected to be a JSON file in the CycloneDX format.

Type: Path

Example Usage:

imput_files:
  sbom: my-awesome-sbom.cdx.json
strace_results

The path to the output file created when running strace on your build or runtime executable.

This must be provided when operating_mode is set to run.

This file can be creating using the following command to wrap your build script or runtime executable. The strace tool must be installed on your system separately from choppr.

strace -f -e trace=file -o "strace_output.txt" <build script/runtime executable>

Type: Path | None

Default: None

Example Usage:

input_files:
  strace_results: strace_output.txt
cache_archive

The path for the cache archive to load to avoid pulling cache data again, or when offline.

Type: Path | None

Default: None

Example Usage:

input_files:
  cache_archive: /backup/choppr-cache.tar.gz

Repositories

Type: dict[PurlType, list[Repository]]

To obtain the list of repositories on your system, use one of the following commands:

# For RHEL 8 and later
dnf repolist --verbose

# For RHEL 7 and earlier
yum repolist --verbose

# For Debian
cat /etc/apt/sources.list /etc/apt/sources.list.d/*

With the output from one of these commands, you should be able to find the URLs to the repositories used on your system.

Repository

The URL for a repository, or multiple repository URLs, paired with optional credentials and/or a certificate.

Debian repositories have an extra distributions keyword.

Type:

# You must provide a url or a list of urls using one or both of the following keys
url: HttpUrl | None
urls: list[HttpUrl] | None
credentials: Credentials | None
certificate: Path | None
# Debian ONLY
distributions: list[DebianDistribution]

Example Usage:

repositories:
  rpm:
    - url: http://public.repo.com
    - url: http://private.repo.com
      credentials:
        username: repouser
        pass_env: PRIVATE_REPO_PASSWORD
      certificate: /my/private/repo/cert.pem
    - urls:
        - http://private.repo.com/base
        - http://private.repo.com/updates
        - http://private.repo.com/security
      credentials:
        username: repouser
        pass_env: PRIVATE_REPO_PASSWORD
    ...
  deb:
    - url: http://archive.ubuntu.com/ubuntu
      distributions:
        - name: jammy
          components:
            - main
            - security
            ...
        ...
    ...
  ...
Credentials

The credentials to use when accessing the repository.

If you provide user_env, it will override the value of username. You only need to provide one or the other.

Type:

username: str
user_env: str
pass_env: str
DebianDistribution

Distribution information for a Debian repository.

Type:

name: str
components: list[str]

Default:

name:  # This is required, and has no default
components:
  - main
  - restricted
  - universe
  - multiverse

Output Files

cache_archive

The path to write the cache archive to that can be used later as an input.

Type: Path | None

Default: None

Example Usage:

output_files:
  cache_archive: /backup/choppr-cache.tar.gz
excluded_components

The paths to write excluded components to using the optionally provided format when writing components to the list.

Type: dict[PurlType, ExcludedComponentsFile]

Default:

_purl_type_:
  file: "choppr-excluded-components-<purl_type>.txt"
  component_format: "<excluded_component_format>"
...

For excluded_component_format the default value is {name}={version} except for NPM, and RPM. Those are as follows:

NPM: "{name}@{version}"
RPM: "{name}-{version}"

Example Usage:

output_files:
  excluded_components:
    deb:
      file: excluded_deb_components.csv
      component_format: "{name},{version}
sbom

The path to write the chopped SBOM to.

By default it outputs to the same folder as the input sbom, using the same filename, with chopped prepended.

Type: Path

Default: chopped-<input_sbom>

Example Usage:

output_files:
  sbom: chopped-sbom.cdx.json

Options

allow_partial_filename_match

Allow partial matching for filenames when comparing strace files to files provided by remote repository packages.

This may be useful when symlinks are used for libraries. This is currently only implemented for RPMs.

Type: bool

Default: false

Example Usage:

options:
  allow_partial_filename_match: true
allow_version_mismatch

Allow version numbers to be mismatched when comparing SBOM packages to remote repository packages.

Type: bool

Default: false

Example Usage:

options:
  allow_version_mismatch: true
allowlist

A dictionary with packages to always keep in the SBOM.

The keys are purl types, and the values are a list of packages. A package has two members, name and version, both are regex patterns.

Type:

allowlist: # dict[PurlType, list[PackagePattern]]
  _purl_type_: # str (deb, npm, rpm, ...)
    - name: regex
      version: regex
    ...
  ...

Default: {}

Example Usage:

options:
  allowlist:
    deb:
      - name: ".*"
        version: ".*"
    generic:
      - name: "^python$"
        version: "^3.10"
archive_cache

Enable archive_cache to archive the cache directory when Choppr finishes running in run mode.

This has no effect in cache mode, as the archive will always be created in that mode.

Type: bool

Default: false

Example Usage:

options:
  archive_cache: true
cache_dir

The path for the cache directory where Choppr will output temporary and downloaded files.

Type: Path

Default: ./.cache/choppr

Example Usage:

options:
  cache_dir: /tmp/choppr
cache_timeout

The timeout for local cache files, like DEB packages, that aren't traced to a checksum, like RPM packages.

Expects a number followed by a unit (d = days, h = hours, m = minutes, s = seconds).

Type: str | bool

Default: 7d

Example Usage:

options:
  cache_timeout: 24h
clear_cache

Enable clear_cache to delete the cache directory when Choppr finishes running.

Type: bool

Default: false

Example Usage:

options:
  clear_cache: true
delete_excluded

Disable delete_excluded to keep components that are discovered to be unnecessary and marked as excluded.

Type: bool

Default: true

Example Usage:

options:
  delete_excluded: false
denylist

A dictionary with packages to always remove from the SBOM.

The keys are purl types, and the values are a list of packages. A package has two members, name and version, both are regex patterns.

Type:

denylist: # dict[PurlType, list[PackagePattern]]
  _purl_type_: # str (deb, npm, rpm, ...)
    - name: regex
      version: regex
    ...
  ...

Default: {}

Example Usage:

options:
  denylist:
    deb:
      - name: "cmake"
        version: "3.22"
    npm:
      - name: ".*"
        version: ".*"
http_limits

Limits to enforce when performing HTTP requests within Choppr.

  • retries - The number of times to retry the request if it fails
  • retry_interval - The number of seconds to wait before retrying the request
  • timeout - The number of seconds to wait for a request to complete before timing out

Type:

http_limits:  # HttpLimits
  retries: PositiveInt
  retry_interval: PositiveFloat
  timeout: PositiveFloat

Default:

http_limits:
  retries: 3
  retry_interval: 5
  timeout: 60

Example Usage:

options:
  http_limits:
    retries: 10
    retry_interval: 30
    timeout: 300
keep_essential_os_components

Keep components that are essential to the operating system, to include the operating system component.

Type: bool

Default: false

Example Usage:

options:
  keep_essential_os_components: true
recursion_limit

A positive integer that will limit the number of recursive calls to use when checking for nested package dependencies.

Type: PositiveInt

Default: 10

Example Usage:

options:
  recursion_limit: 20
sort_sbom

Sort the output SBOM so that the elements are in the order defined in the schema.

Type: bool

Default: false

Example Usage:

options:
  sort_sbom: true
strace_regex_excludes

An array of regex strings, used to filter the strace input. The example below shows some of the recommended regular expressions.

Type: list[str]

Default: []

Example Usage:

options:
  strace_regex_excludes:
    - "^.*project-name.*$"              # Ignore all files containing the project name to exclude source files
    - "^.*\.(c|cpp|cxx|h|hpp|o|py|s)$"  # Ignore source, header, object, and script files
    - "^/usr/share/pkgconfig$"          # Ignore pkgconfig, which is included/modified by several RPMs
    - "^/tmp$"                          # Ignore the tmp directory
    - "^bin$"                           # Ignore overly simple files, that will be matched by most RPMs
    - "^.*\.git.*$"                     # Ignore all hidden git directories and files
    - "^.*(\.\.)+.*$"                   # Ignore all relative paths containing '..'
    - "^.*(CMakeFiles.*|\.cmake)$"      # Ignore all CMake files

Specificaitons for developers

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

choppr-1.2.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

choppr-1.2.0-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file choppr-1.2.0.tar.gz.

File metadata

  • Download URL: choppr-1.2.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.10.19 Linux/5.15.154+

File hashes

Hashes for choppr-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9116504c0f48ff3be4c682fe5b1b1cc1ee5b6cea6d7d3a0b94aea907bbf5aa5d
MD5 9f42fb17a81a201ec8c4bd8d1a3992b7
BLAKE2b-256 0265db8e7879833a530a803b1d78ad47b0d82c3f97375603bea0f3ea3ad94e46

See more details on using hashes here.

File details

Details for the file choppr-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: choppr-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.10.19 Linux/5.15.154+

File hashes

Hashes for choppr-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d55a2d6db01ecf34d9dd7bc8b5e2bc7439543e6342b91e85b32341d213f13c36
MD5 c193f2de61e03e02726930f973f02b51
BLAKE2b-256 ee1ba4e088f69a4f6ff05b7341cb4a721cb51a23e9c0e85e101bd4b576531a41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page