Skip to main content

Modular framework to gather file information, analyze dependencies, and generate an SBOM

Project description

Blue magnifying glass Surfactant logo

Surfactant

A modular framework to gather file information for SBOM generation and dependency analysis.

CI Test Status PyPI Python Versions MIT License Documentation Status pre-commit.ci status

Documentation

Description

Surfactant can be used to gather information from a set of files to generate an SBOM, along with manipulating SBOMs and analyzing the information in them. It pulls information from recognized file types (such as PE, ELF, or MSI files) contained within a directory structure corresponding to an extracted software package. By default, the information is "surface-level" metadata contained in the files that does not require running the files or decompilation.

Installation

For Users:

For ease of use, we recommend using pipx since it transparently handles creating and using Python virtual environments, which helps avoid dependency conflicts with other installed Python apps. Install pipx by following their installation instructions.

  1. Install Surfactant using pipx install (with python >= 3.8)
pipx install surfactant
  1. Install plugins using pipx inject surfactant. As an example, this is how the fuzzy hashing plugin could be installed from a git repository (PyPI package names, local source directories, or wheel files can also be used).
pipx inject surfactant git+https://github.com/LLNL/Surfactant#subdirectory=plugins/fuzzyhashes

If for some reason manually managing virtual environments is desired, the following steps can be used instead:

  1. Create a virtual environment with python >= 3.8 and activate it [Optional, but highly recommended over a global install]
python -m venv cytrics_venv
source cytrics_venv/bin/activate
  1. Install Surfactant with pip install
pip install surfactant
  1. Install plugins using pip install. As an example, this is how the fuzzy hashing plugin could be installed from a git repository (PyPI package names, local source directories, or wheel files can also be used).
pip install git+https://github.com/LLNL/Surfactant#subdirectory=plugins/fuzzyhashes

For Developers:

  1. Create a virtual environment with python >= 3.8 [Optional, but recommended]
python -m venv cytrics_venv
source cytrics_venv/bin/activate
  1. Clone sbom-surfactant
git clone git@github.com:LLNL/Surfactant.git
  1. Create an editable surfactant install (changes to code will take effect immediately):
pip install -e .

To install optional dependencies required for running pytest and pre-commit:

pip install -e ".[test,dev]"

pip install with the -e or --editable option can also be used to install Surfactant plugins for development.

pip install -e plugins/fuzzyhashes

Usage

Identify sample file

In order to test out surfactant, you will need a sample file/folder. If you don't have one on hand, you can download and use the portable .zip file from https://github.com/ShareX/ShareX/releases or the Linux .tar.gz file from https://github.com/GMLC-TDC/HELICS/releases. Alternatively, you can pick a sample from https://lc.llnl.gov/gitlab/cir-software-assurance/unpacker-to-sbom-test-files

Build configuration file

A configuration file contains the information about the sample to gather information from. Example JSON configuration files can be found in the examples folder of this repository.

extractPaths: (required) the absolute path or relative path from location of current working directory that surfactant is being run from to the sample folders, cannot be a file (Note that even on Windows, Unix style / directory separators should be used in paths)
archive: (optional) the full path, including file name, of the zip, exe installer, or other archive file that the folders in extractPaths were extracted from. This is used to collect metadata about the overall sample and will be added as a "Contains" relationship to all software entries found in the various extractPaths
installPrefix: (optional) where the files in extractPaths would be if installed correctly on an actual system i.e. "C:/", "C:/Program Files/", etc (Note that even on Windows, Unix style / directory separators should be used in the path). If not given then the extractPaths will be used as the install paths

Create config command

A basic configuration file can be easily built using the create-config command. This will take a path as a command line argument and will save a file with the default name of the end directory passed to it as a json file. i.e., /home/user/Desktop/myfolder will create myfolder.json.

$  surfactant create-config [INPUT_PATH]

The --output flag can be used to specify the configuration output name. The --install-prefix can be used to specify the install prefix, the default is '/'.

$  surfactant create-config [INPUT_PATH] --output new_output.json --install-prefix 'C:/'

Example configuration file

Lets say you have a .tar.gz file that you want to run surfactant on. For this example, we will be using the HELICS release .tar.gz example. In this scenario, the absolute path for this file is /home/samples/helics.tar.gz. Upon extracting this file, we get a helics folder with 4 sub-folders: bin, include, lib64, and share.

Example 1: Simple Configuration File

If we want to include only the folders that contain binary files to analyze, our most basic configuration would be:

[
  {
    "extractPaths": ["/home/samples/helics/bin", "/home/samples/helics/lib64"]
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/home/samples/helics/bin/helics_binary"],
      "containerPath": null
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/home/samples/helics/lib64/lib1.so"],
      "containerPath": null
    }
  ],
  "relationships": [
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    }
  ]
}
Example 2: Detailed Configuration File

A more detailed configuration file might look like the example below. The resulting SBOM would have a software entry for the helics.tar.gz with a "Contains" relationship to all binaries found to in the extractPaths. Providing the install prefix of / and an extractPaths as /home/samples/helics will allow to surfactant correctly assign the install paths in the SBOM for binaries in the subfolders as /bin and /lib64.

[
  {
    "archive": "/home/samples/helics.tar.gz",
    "extractPaths": ["/home/samples/helics"],
    "installPrefix": "/"
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc0",
      "fileName": ["helics.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/bin/helics_binary"],
      "containerPath": ["abc0/bin/helics_binary"]
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/lib64/lib1.so"],
      "containerPath": ["abc0/lib64/lib1.so"]
    }
  ],
  "relationships": [
    {
      "xUUID": "abc0",
      "yUUID": "abc1",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc2",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    }
  ]
}
Example 3: Adding Related Binaries

If our sample helics tar.gz file came with a related tar.gz file to install a plugin extension module (extracted into a helics_plugin folder that contains bin and lib64 subfolders), we could add that into the configuration file as well:

[
  {
    "archive": "/home/samples/helics.tar.gz",
    "extractPaths": ["/home/samples/helics"],
    "installPrefix": "/"
  },
  {
    "archive": "/home/samples/helics_plugin.tar.gz",
    "extractPaths": ["/home/samples/helics_plugin"],
    "installPrefix": "/"
  }
]

The resulting SBOM would be structured like this:

{
  "software": [
    {
      "UUID": "abc0",
      "fileName": ["helics.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc1",
      "fileName": ["helics_binary"],
      "installPath": ["/bin/helics_binary"],
      "containerPath": ["abc0/bin/helics_binary"]
    },
    {
      "UUID": "abc2",
      "fileName": ["lib1.so"],
      "installPath": ["/lib64/lib1.so"],
      "containerPath": ["abc0/lib64/lib1.so"]
    },
    {
      "UUID": "abc3",
      "fileName": ["helics_plugin.tar.gz"],
      "installPath": null,
      "containerPath": null
    },
    {
      "UUID": "abc4",
      "fileName": ["helics_plugin"],
      "installPath": ["/bin/helics_plugin"],
      "containerPath": ["abc3/bin/helics_plugin"]
    },
    {
      "UUID": "abc5",
      "fileName": ["lib_plugin.so"],
      "installPath": ["/lib64/lib_plugin.so"],
      "containerPath": ["abc3/lib64/lib_plugin.so"]
    }
  ],
  "relationships": [
    {
      "xUUID": "abc1",
      "yUUID": "abc2",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc4",
      "yUUID": "abc5",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc5",
      "yUUID": "abc2",
      "relationship": "Uses"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc1",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc0",
      "yUUID": "abc2",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc3",
      "yUUID": "abc4",
      "relationship": "Contains"
    },
    {
      "xUUID": "abc3",
      "yUUID": "abc5",
      "relationship": "Contains"
    }
  ]
}

NOTE: These examples have been simplified to show differences in output based on configuration.

Run surfactant

$  surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM]

CONFIG_FILE: (required) the config file created earlier that contains the information on the sample
SBOM OUTPUT: (required) the desired name of the output file
INPUT_SBOM: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems
--skip_gather: (optional) skips the gathering of information on files and adding software entires
--skip_relationships: (optional) skips the adding of relationships based on metadata
--skip_install_path: (optional) skips including an install path for the files discovered. This may cause "Uses" relationships to also not be generated
--recorded_institution: (optional) the name of the institution collecting the SBOM data (default: LLNL)
--output_format: (optional) changes the output format for the SBOM (given as full module name of a surfactant plugin implementing the write_sbom hook)
--input_format: (optional) specifies the format of the input SBOM if one is being used (default: cytrics) (given as full module name of a surfactant plugin implementing the read_sbom hook)
--help: (optional) show the help message and exit

Understanding the SBOM Output

Software

This section contains a list of entries relating to each piece of software found in the sample. Metadata including file size, vendor, version, etc are included in this section along with a uuid to uniquely identify the software entry.

Relationships

This section contains information on how each of the software entries in the previous section are linked.

Uses: this relationship type means that x software uses y software i.e. y is a helper module to x
Contains: this relationship type means that x software contains y software (often x software is an installer or archive such as a zip file)

Observations:

This section contains information about notable observations about individual software components. This could be vulnerabilities, observed features, etc

Merging SBOMs

A folder containing multiple separate SBOM JSON files can be combined using merge_sbom.py with a command such the one below that gets a list of files using ls, and then uses xargs to pass the resulting list of files to merge_sbom.py as arguments.

ls -d ~/Folder_With_SBOMs/Surfactant-* | xargs -d '\n' surfactant merge --config_file=merge_config.json --sbom_outfile combined_sbom.json

If the config file option is given, a top-level system entry will be created that all other software entries are tied to (directly or indirectly based on other relationships). Specifying an empty UUID will make a random UUID get generated for the new system entry, otherwise it will use the one provided.

Details on the merge command can be found in the docs page here.

Plugins

Surfactant supports using plugins to add additional features. For users, installing and enabling a plugin usually just involves doing a pipx inject surfactant when using pipx or pip install of the plugin if manually managing virtual environments.

Detailed information on configuration options for the plugin system and how to develop new plugins can be found here.

Support

Full user guides for Surfactant are available online and in the docs directory.

For questions or support, please create a new discussion on GitHub Discussions, or open an issue for bug reports and feature requests.

Contributing

Contributions are welcome. Bug fixes or minor changes are preferred via a pull request to the Surfactant GitHub repository. For more information on contributing see the CONTRIBUTING file.

License

Surfactant is released under the MIT license. See the LICENSE and NOTICE files for details. All new contributions must be made under this license.

SPDX-License-Identifier: MIT

LLNL-CODE-850771

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surfactant-0.0.0rc7.tar.gz (581.4 kB view details)

Uploaded Source

Built Distribution

Surfactant-0.0.0rc7-py3-none-any.whl (100.7 kB view details)

Uploaded Python 3

File details

Details for the file surfactant-0.0.0rc7.tar.gz.

File metadata

  • Download URL: surfactant-0.0.0rc7.tar.gz
  • Upload date:
  • Size: 581.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for surfactant-0.0.0rc7.tar.gz
Algorithm Hash digest
SHA256 d3794d159c82490c267f38b392f6ec959b99bea0dc4efbccac9108a2e0ba2128
MD5 dc57f67c45dff86cc79a7c01c0305fe1
BLAKE2b-256 24bbf39561c65e3c97f22f778cbf64fa740b4907f1b580a4070dad5c87d410f7

See more details on using hashes here.

File details

Details for the file Surfactant-0.0.0rc7-py3-none-any.whl.

File metadata

File hashes

Hashes for Surfactant-0.0.0rc7-py3-none-any.whl
Algorithm Hash digest
SHA256 1d67bf58e82179c1ed6220c5365522e7f6e6987ec659387c5e1d687380255eb2
MD5 1d9788570edf62c19502bbd352391a71
BLAKE2b-256 9fc92210b24510b28c3419f099fd48d066c750f7fa612f4dc51fc349d06f0c66

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page