Skip to main content

Packj flags "risky" open-source packages in your software supply chain

Project description

  Packj flags malicious/risky open-source packages

Packj (pronounced package) is a tool to mitigate software supply chain attacks. It can detect malicious, vulnerable, abandoned, typo-squatting, and other "risky" packages from popular open-source package registries, such as NPM, RubyGems, and PyPI. It can be easily customized to minimize noise. Packj started as a PhD research project and is currently being developed under various govt grants.

GitHub Stars Prs Welcome Github Commit Activity Discord License: AGPL v3 Docker PyPI - Downloads

demo video

Contents

  • Get started - available as Docker image, GitHub Action, Python PyPI package
  • Functionality - deep static/dynamic code analysis and sandboxing
  • Our story - started as a PhD research project and is backed by govt grants
  • Why Packj - existing CVE scanners ASSUME code is BENIGN and not analyze its behavior
  • Customization - turn off alerts as per your threat model to reduce noise
  • Malware found - reported over 70 malicious PyPI and RubyGems packages
  • Talks and videos - presentations from PyCon, OpenSourceSummit, BlackHAT
  • Project roadmap - view or suggest new features; join our discord channel
  • Team and collaboration - expert Cybersecurity researchers from academia/industry
  • FAQ - supported package managers, commonly asked questions on techniques, and more

Get started

We support multiple deployment models:

1. GitHub runner

Use Packj to audit dependencies in pull requests.

- name: Packj Security Audit
  uses: ossillate-inc/packj-github-action@0.0.4-beta
  with:
    # TODO: replace with your dependency files in the repo
    DEPENDENCY_FILES: pypi:requirements.txt,npm:package.json,rubygems:Gemfile
    REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View on GitHub marketplace. Example PR run.

2. PyPI package

The quickest way to try/test Packj is using the PyPI package.

Warning: Packj only works on Linux.

pip3 install packj

Auditing RubyGems require additional dependencies

bundle install

3. Docker image (recommended)

Use Docker or Podman for containerized (isolated) runs.

docker run -v /tmp:/tmp/packj -it ossillate/packj:latest --help

4. Source repo

Clone this repo,

https://github.com/ossillate-inc/packj.git && cd packj

Install dependencies

bundle install && pip3 install -r requirements.txt

Start with help:

python3 main.py --help 

Functionality

Packj offers the following tools:

  • Audit - to vet a package for "risky" attributes.
  • Sandbox - for safe installation of a package.

Auditing a package

Packj audits open-source software packages for "risky" attributes that make them vulnerable to supply chain attacks. For instance, packages with expired email domains (lacking 2FA), large release time gap, sensitive APIs or access permissions, etc. are flagged as risky.

Auditing the following is supported:

  • multiple packages: python3 main.py audit -p pypi:requests rubygems:overcommit
  • dependency files: python3 main.py audit -f npm:package.json pypi:requirements.txt

By default, audit only performs static code analysis to detect risky code. You can paas -t or --trace flag to perform dynamic code analysis as well, which will install all requested packages under strace and monitor install-time behavior of packages. Please see the example output below.

Show example run/output

$ docker run -v /tmp:/tmp/packj -it ossillate/packj:latest audit --trace -p npm:browserify

[+] Fetching 'browserify' from npm..........PASS [ver 17.0.0]
[+]    Checking package description.........PASS [browser-side require() the node way]
[+]    Checking release history.............PASS [484 version(s)]
[+] Checking version........................RISK [702 days old]
[+]    Checking release time gap............PASS [68 days since last release]
[+] Checking author.........................PASS [mail@substack.net]
[+]    Checking email/domain validity.......RISK [expired author email domain]
[+] Checking readme.........................PASS [26838 bytes]
[+] Checking homepage.......................PASS [https://github.com/browserify/browserify#readme]
[+] Checking downloads......................PASS [2M weekly]
[+] Checking repo URL.......................PASS [https://github.com/browserify/browserify]
[+]    Checking repo data...................PASS [stars: 14189, forks: 1244]
[+]    Checking if repo is a forked copy....PASS [original, not forked]
[+]    Checking repo description............PASS [browser-side require() the node.js way]
[+]    Checking repo activity...............PASS [commits: 2290, contributors: 207, tags: 413]
[+] Checking for CVEs.......................PASS [none found]
[+] Checking dependencies...................RISK [48 found]
[+] Downloading package from npm............PASS [163.83 KB]
[+] Analyzing code..........................RISK [needs 3 perm(s): decode,codegen,file]
[+] Checking files/funcs....................PASS [429 files (383 .js), 744 funcs, LoC: 9.7K]
[+] Installing package and tracing code.....PASS [found 5 process,1130 files,22 network syscalls]
=============================================
[+] 5 risk(s) found, package is undesirable!
=> Complete report: /tmp/packj_54rbjhgm/report_npm-browserify-17.0.0_hlr1rhcz.json
{
    "undesirable": [
        "old package: 702 days old",
        "invalid or no author email: expired author email domain",
        "generates new code at runtime",
        "reads files and dirs",
        "forks or exits OS processes",
    ]
}

WARNING: since packages could execute malicious code during installation, it is recommended to ONLY use -t or --trace when running inside a Docker container or a Virtual Machine.

Audit can also be performed in Docker/Podman containers. Please find details on risky attributes and how to use at Audit README.

Sandboxed package installation

Packj offers a lightweight sandboxing for safe installation of a package. Specifically, it prevents malicious packages from exfiltrating sensitive data, accessing sensitive files (e.g., SSH keys), and persisting malware.

It sandboxes install-time scripts, including any native compliation. It uses strace (i.e., NO VM/Container required).

Please find details on the sandboxing mechanism and how to use at Sandbox README.

Show example run/output

$ python3 main.py sandbox gem install overcommit

Fetching: overcommit-0.59.1.gem (100%)
Install hooks by running `overcommit --install` in your Git repository
Successfully installed overcommit-0.59.1
Parsing documentation for overcommit-0.59.1
Installing ri documentation for overcommit-0.59.1

#############################
# Review summarized activity
#############################

[+] Network connections
    [+] DNS (1 IPv4 addresses) at port 53 [rule: ALLOW]
    [+] rubygems.org (4 IPv6 addresses) at port 443 [rule: IPv6 rules not supported]
    [+] rubygems.org (4 IPv4 addresses) at port 443 [rule: ALLOW]
[+] Filesystem changes
/
└── home
    └── ubuntu
        └── .ruby
            ├── gems
            │   ├── iniparse-1.5.0 [new: DIR, 15 files, 46.6K bytes]
            │   ├── rexml-3.2.5 [new: DIR, 77 files, 455.6K bytes]
            │   ├── overcommit-0.59.1 [new: DIR, 252 files, 432.7K bytes]
            │   └── childprocess-4.1.0 [new: DIR, 57 files, 141.2K bytes]
            ├── cache
            │   ├── iniparse-1.5.0.gem [new: FILE, 16.4K bytes]
            │   ├── rexml-3.2.5.gem [new: FILE, 93.2K bytes]
            │   ├── childprocess-4.1.0.gem [new: FILE, 34.3K bytes]
            │   └── overcommit-0.59.1.gem [new: FILE, 84K bytes]
            ├── specifications
            │   ├── rexml-3.2.5.gemspec [new: FILE, 2.7K bytes]
            │   ├── overcommit-0.59.1.gemspec [new: FILE, 1.7K bytes]
            │   ├── childprocess-4.1.0.gemspec [new: FILE, 1.8K bytes]
            │   └── iniparse-1.5.0.gemspec [new: FILE, 1.3K bytes]
            ├── bin
            │   └── overcommit [new: FILE, 622 bytes]
            └── doc
                ├── iniparse-1.5.0
                │   └── ri [new: DIR, 119 files, 131.7K bytes]
                ├── rexml-3.2.5
                │   └── ri [new: DIR, 836 files, 841K bytes]
                ├── overcommit-0.59.1
                │   └── ri [new: DIR, 1046 files, 1.5M bytes]
                └── childprocess-4.1.0
                    └── ri [new: DIR, 272 files, 297.8K bytes]

[C]ommit all changes, [Q|q]uit & discard changes, [L|l]ist details:

Our story

TL;DR Packj started as a PhD research project. It is backed by various government grants.

Show long answer

Packj started as an academic research project. Specifically, the static code analysis techniques used by Packj are based on cutting-edge Cybersecurity research: MalOSS project by our research group at Georgia Tech.

academic paper

Packj is backed by generous grants from NSF, GRA, and ALInnovate.

Why Packj

TL;DR The state-of-the-art open-source vulnerability scanners assume TRUSTED code. Therefore, all of them ONLY scan for CVEs. Whereas, Packj not only scans for CVEs, but also carries out deep code analsysis to flag any hidden malware and "risky” code behavior, such as spawning of shell, use of SSH keys, and mismatch of GitHub code vs packaged code (provenance). Such risky behavior/attributes does not qualify as vulnerabilities (CVEs), which is why none of the existing tools can flag them.

Show long answer

Security vulnerabilities (a.k.a. CVEs) are result of accidental programming bugs (e.g., Log4J, HeartBleed). Typical example is a missing bounds check on user input, which makes the program vulnerable to buffer overflow attacks. Attackers need to develop an exploit to trigger such security vulnerabilities (e.g., a crafted TCP/IP packet in case of HeartBleed or a numerically high input to cause buffer overflow). Such CVEs can be fixed by patching or upgrading to a newer version of the library (e.g., newer version of Log4J fixes the CVE).

In contrast, malware is purposefully bad. Moreover, malware itself is an exploit and cannot be patched or fixed by upgrading to a newer version. For example, dependency confusion attack was intentionally malicious; it did not exploit any accidental programming bug in the code. Similarly, an author of popular package sabotaging their own code to protest against the war is very much intentional and does not exploit any CVEs. Typo-squatting is another attack vector that bad actors use to propagate malware in popular open-source package registries: it exploits typos and inexperience of devs, not accidental programming bugs or CVEs in the code.

Existing scanners DO NOT detect malware or intentionally bad code because they assume that the third-party open-source code is benign. As such, these tools simply scan the source code for open-source dependencies, compile a list of all dependencies being used, and look each <dependency-NAME, dependency-VERSION> up in a database (e.g., NVD) to report if the source code uses any vulnerable package versions (e.g., vulnerable version of Log4J, LibSSL version affected by HeartBleed).

Packj uses static code analysis, runtime tracing or dynamic analysis, and metadata checks to audit programmatic behavior of the package. Please read more at Audit README

Customization

Packj can be easily customized (zero noise) to your threat model. Simply add a .packj.yaml file in the top dir of your repo/project and reduce alert fatigue by commenting out unwanted attributes.

Malware found

We found over 40 and 20 malicious packages on PyPI and Rubygems, respectively using this tool. A number of them been taken down. Refer to an example below:

Show example malware

$ python3 main.py audit pypi:krisqian

[+] Fetching 'krisqian' from pypi...OK [ver 0.0.7]
[+] Checking version...OK [256 days old]
[+] Checking release history...OK [7 version(s)]
[+] Checking release time gap...OK [1 days since last release]
[+] Checking author...OK [KrisWuQian@baidu.com]
    [+] Checking email/domain validity...OK [KrisWuQian@baidu.com]
[+] Checking readme...ALERT [no readme]
[+] Checking homepage...OK [https://www.bilibili.com/bangumi/media/md140632]
[+] Checking downloads...OK [13 weekly]
[+] Checking repo_url URL...OK [None]
[+] Checking for CVEs...OK [none found]
[+] Checking dependencies...OK [none found]
[+] Downloading package 'KrisQian' (ver 0.0.7) from pypi...OK [1.94 KB]
[+] Analyzing code...ALERT [needs 3 perms: process,network,file]
[+] Checking files/funcs...OK [9 files (2 .py), 6 funcs, LoC: 184]
=============================================
[+] 6 risk(s) found, package is undesirable!
{
    "undesirable": [
        "no readme",
        "only 45 weekly downloads",
        "no source repo found",
        "generates new code at runtime",
        "fetches data over the network: ['KrisQian-0.0.7/setup.py:40', 'KrisQian-0.0.7/setup.py:50']",
        "reads files and dirs: ['KrisQian-0.0.7/setup.py:59', 'KrisQian-0.0.7/setup.py:70']"
    ]
}
=> Complete report: pypi-KrisQian-0.0.7.json
=> View pre-vetted package report at https://packj.dev/package/PyPi/KrisQian/0.0.7

Packj flagged KrisQian (v0.0.7) as suspicious due to absence of source repo and use of sensitive APIs (network, code generation) during package installation time (in setup.py). We decided to take a deeper look, and found the package malicious. Please find our detailed analysis at https://packj.dev/malware/krisqian.

More examples of malware we found are listed at https://packj.dev/malware Please reach out to us at oss@ossillate.com for full list.

Resources

To learn more about Packj tool or open-source software supply chain attacks, refer to our

PyConUS'22 Video OSSEU'22 Video

Feature roadmap

  • Add support for other language ecosystems. Rust is a work in progress, and will be available in December '22.
  • Add functionality to detect several other "risky" code as well as metadata attributes.

Have a feature or support request? Please visit our GitHub discussion page or join our discord community for discussion and requests.

Team

Packj has been developed by Cybersecurity researchers at Ossillate Inc. and external collaborators to help developers mitigate risks of supply chain attacks when sourcing untrusted third-party open-source software dependencies. We thank our developers and collaborators.

We welcome code contributions with open arms. See CONTRIBUTING.md guidelines. Found a bug? Please open an issue. Refer to our SECURITY.md guidelines to report a security issue.

FAQ

What Package Managers (Registries) are supported?

Packj can currently vet NPM, PyPI, and RubyGems packages for "risky" attributes. We are adding support for Rust.

What techniques does Packj employ to detect risky/malicious packages?

Packj uses static code analysis, dynamic tracing, and metadata analysis for comprehensive auditing. Static analysis alone is not sufficient to flag sophisticated malware that can hide itself better using code obfuscation. Dynamic analysis is performed by installing the package under strace and monitoring it's runtime behavior. Please read more at Audit README.

Does it work on obfuscated calls? For example, a base 64 encrypted string that gets decrypted and then passed to a shell?

This is a very common malicious behavior. Packj detects code obfuscation as well as spawning of shell commands (exec system call). For example, Packj can flag use of getattr() and eval() API as they indicate "runtime code generation"; a developer can go and take a deeper look then. See main.py for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

packj-0.15.tar.gz (219.7 kB view details)

Uploaded Source

File details

Details for the file packj-0.15.tar.gz.

File metadata

  • Download URL: packj-0.15.tar.gz
  • Upload date:
  • Size: 219.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.25.0 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.9

File hashes

Hashes for packj-0.15.tar.gz
Algorithm Hash digest
SHA256 c21f25a1dd1e1d673e141e60a9b5a2bb8e2c95314b12c3af23d7cbed80bd2987
MD5 b38f6e9693c122a9e02e0afacc7c962b
BLAKE2b-256 cc7b97b29a95665af1a0ca293d67ce7b1920fe10392adc99bcafc2321f5f8627

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page