Skip to main content

Dependency Observatory Scanner: a scanner for software packages and dependencies

Project description

find-package-rugaru

find-package-rugaru finds open source dependent packages in a git repository and tests and flags suspicious open source packages (like the legendary rugaru).

NB: this project is in an alpha state and its APIs are not stable

Scanner

Installation

Requirements
  • docker >=18.06.3
  • python 3.8+ and pip
  • jq
  • system packages to build psycopg2 (e.g. build-essential libpq-dev on debian buster)

After installing the above requirements:

$ git clone https://github.com/mozilla-services/find-package-rugaru.git
$ cd find-package-rugaru
$ make install install-dev
$ docker pull mozilla/dependencyscan
$ docker pull postgres:12

Example Usage

  1. start a local postgres database:
$ make start-db
  1. run one or more of the analysis scripts:
  • ./bin/analyze_package.sh <package_name> [<package_version>]
  • ./bin/analyze_repo.sh <repository_url>
$ ./bin/analyze_package.sh lodash 4.17.15
analyzing lodash@4.17.15 saving intermediate results to /tmp/dep-obs.g7mNNBaLyVjR
...

2020-02-27 17:31:31,900 - fpr - INFO - pipeline finished
null
2020-02-27 17:31:32,403 - fpr - INFO - pipeline finished
$

or if you have more time to scan all git tags of the lodash repo:

./bin/analyze_repo.sh https://github.com/lodash/lodash.git
analyzing tags of https://github.com/lodash/lodash.git saving intermediate results to /tmp/dep-obs.5pvSrfbn6Nox
...

Check the source of the scripts to find additional configuration via environment variables.

  1. Inspect the results in the local database:
make db-shell
PGPASSWORD=postgres psql -U postgres -h localhost -p 5432 dependency_observatory
psql (12.2 (Ubuntu 12.2-2.pgdg18.04+1), server 12.1 (Debian 12.1-1.pgdg100+1))
Type "help" for help.

dependency_observatory=# \x on
Expanded display is on.
dependency_observatory=# SELECT * FROM package_versions WHERE name = 'lodash' ORDER BY inserted_at DESC;
-[ RECORD 1 ]-------------------------------------------------------
id          | 102
name        | lodash
version     | 4.17.15
language    | node
url         | https://registry.npmjs.org/lodash/-/lodash-4.17.15.tgz
repo_url    |
repo_commit |
inserted_at | 2020-02-26 17:12:47.373348
updated_at  |

Pipelines

The scripts are composed of components called pipelines (for lack of a better term). For example analyze_package.sh will:

  1. fetch information about the package from the npm registry
  2. filter for git refs to clone and, if specified, the matching version
  3. find dependency manifests or lockfiles (e.g. package.json) for each ref in a debian:buster-slim docker image
  4. run npm install --save=true, npm list --json, and npm audit --json in the project root for each ref in a node:10-buster-slim docker image
  5. postprocess and save the results to the local postgres database

Each individual pipeline can be run on its own. For example the following find_git_refs pipeline used in analyze_repo.sh will find git tags for the mozilla-services/channelserver project:

$ echo '{"repo_url": "https://github.com/mozilla-services/channelserver"}' | docker run -i --rm -v /var/run/docker.sock:/var/run/docker.sock --name fpr-test mozilla/dependencyscan python fpr/run_pipeline.py -v find_git_refs

Current Pipelines (from -h output)

    crate_graph         Parses the output of the cargo metadata pipeline and
                        writes a .dot file of the dependencies to outfile
    dep_graph           Parses the output of the cargo metadata pipeline and
                        writes a .dot file of the dependencies to outfile
    fetch_package_data  Fetches additional data about a dependency.
    find_dep_files      Given a repo_url, clones the repo, lists git refs for
                        each tag
    find_git_refs       Given a repo_url, clones the repo, lists git refs for
                        each tag TODO: every Nth commit, or commit every time
                        interval. TODO: since and until args TODO: find
                        branches
    github_metadata     Given an input file with repo urls metadata output
                        fetches dependency and vulnerability metadata from
                        GitHub and an optional GitHub PAT and outputs them to
                        jsonl and optionally saves them to a local SQLite3 DB.
    postprocess         Post processes tasks for various outputs e.g.
                        flattening deps, filtering and extracting fields, etc.
                        Does not spin up containers or hit the network.
    run_repo_tasks      Runs tasks on a checked out git ref with dep. files
    rust_changelog      Given ordered cargo metadata output for git refs from
                        the same repo: 1. builds a dict of manifest filename
                        to cargo meta 2. groups the output into pairs (i.e. 1,
                        2, 3 -> (1, 2), (2, 3) in the provided order 3.
                        compares each pair as follows: a. compare each
                        manifest filename: 1) count new and removed
                        dependencies 2) new and removed authors and repo urls
                        TODO: output a diff of the updated dep code (need to
                        update the cargo metadata pipeline to pull these)
                        TODO: take audit output to show new and fixed Rust
                        vulns TODO: detect dep version changes
    save_to_db          Saves JSON lines to a postgres DB

See bin/analyze_* scripts and Makefile for example usage.

They feed into each other as follows (* indicates deprecated, removed or otherwise broken pipelines):

pipelines graph

Pipeline API

Note that this interface may be subject to change

Each pipeline:

See the design doc for why this interface was chosen.

Adding a pipeline
  1. copy an existing file from fpr/pipelines/
  2. give it a new name in its Pipeline model declaration
  3. add it to fpr/pipelines/__init__.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dependency-observatory-scanner-2020.4.15.tar.gz (912.4 kB view details)

Uploaded Source

File details

Details for the file dependency-observatory-scanner-2020.4.15.tar.gz.

File metadata

  • Download URL: dependency-observatory-scanner-2020.4.15.tar.gz
  • Upload date:
  • Size: 912.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for dependency-observatory-scanner-2020.4.15.tar.gz
Algorithm Hash digest
SHA256 fd2cbd6d084591610fa022881e67842208652e29dd17abd54d7e258322d73f9f
MD5 ae63f18728172be7e2eb195b537d97a6
BLAKE2b-256 801d3e02f0bb80ce2170700bbd212da263ad90f1736193b2a6fabf3fb987da1f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page