Skip to main content

Repository Scanner - Version Control System - Scanner

Project description

Repository Scanner Version Control System Scanner (RESC-VCS-SCANNER)

Python Celery Pydantic Gitleaks CI OpenSSF Scorecard SonarCloud

[!NOTE]

This component is part of Repository Scanner - resc

Table of contents

  1. About the component
  2. Getting started
  3. Testing

About the component

The RESC-VCS-Scanner component uses the Gitleaks binary file to scan the source code for secrets.

Getting started

These instructions will help you to get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Run locally from source

Preview

Prerequisites:

  • RabbitMQ and RESC web service must be up and running locally.
    If you have already deployed RESC through helm in Kubernetes, then rabbitmq and resc webservice are already running for you.
  • Install Gitleaks v8.18.0 on your system.
  • Download the rule config toml file to /tmp/temp_resc_rule.toml location by running below command from a Git Bash terminal.
  • Send some repositories to 'repositories' topics of RabbitMQ server by referring the README of RESC-VCS-SCRAPER component.
curl https://raw.githubusercontent.com/zricethezav/gitleaks/master/config/gitleaks.toml > /tmp/temp_resc_rule.toml

Clone the repository, open the Git Bash terminal from /components/resc-vcs-scanner folder, and run below commands.

1. Create virtual environment:

cd components/resc-vcs-scanner
pip install virtualenv
virtualenv venv
source venv/Scripts/activate

2. Install resc_vcs_scanner package:

pip install -e .

3. Set below environment variables:

 export RESC_RABBITMQ_SERVICE_HOST=127.0.0.1   #  The hostname/IP address of the rabbitmq server
 export RESC_RABBITMQ_SERVICE_PORT_AMQP=30902  #  The amqp port of the rabbitmq server
 export RABBITMQ_DEFAULT_VHOST=resc-rabbitmq   #  The virtual host name of the rabbitmq server
 export RABBITMQ_USERNAME=queue_user    #  The username used to connect to the rabbitmq projects and repositories topics
 export RABBITMQ_PASSWORD="" # The password used to connect to the rabbitmq projects and repositories topics can be found for the value of queues_password field in /deployment/kubernetes/example-values.yaml file
 export RABBITMQ_QUEUE=repositories # The name of the queue from which secret scanner will read repositories
 export RESC_API_NO_AUTH_SERVICE_HOST=127.0.0.1 #  The hostname/IP address where RESC web service is running
 export RESC_API_NO_AUTH_SERVICE_PORT=30900  #  The port number where RESC web service is running
 export VCS_INSTANCES_FILE_PATH="" # The absolute path to vcs_instances_config.json file containing the vcs instances definitions
 export GITHUB_PUBLIC_USERNAME="" # Your GitHub username
 export GITHUB_PUBLIC_TOKEN="" #  Your GitHub personal access token
 export GITLEAKS_PATH="" # The absolute path to gitleaks binary executable

You need to replace the following values with your custom values: RABBITMQ_PASSWORD, VCS_INSTANCES_FILE_PATH, GITHUB_PUBLIC_USERNAME, GITHUB_PUBLIC_TOKEN and GITLEAKS_PATH.

Structure of vcs instances config json

The vcs_instances_config.json file must have the following format: Note: You can add multiple vcs instances.

Preview

Example:

{
  "vcs_instance_1": {
    "name": "GITHUB_PUBLIC",
	"scope": ["kubernetes"], 
    "exceptions": [],
    "provider_type": "GITHUB_PUBLIC",
    "hostname": "github.com",
    "port": "443",
    "scheme": "https",
    "username": "GITHUB_PUBLIC_USERNAME",
    "token": "GITHUB_PUBLIC_TOKEN",
    "organization": ""
  }
}
  • scope: List of GitHub accounts you want to scan. For example, lets'say you want to scan all the repositories for the following GitHub accounts. https://github.com/kubernetes
    https://github.com/docker

    Then you need to add those accounts to scope like: ["kubernetes", "docker"]. All the repositories from those accounts will be scanned.

  • exceptions (optional): If you want to exclude any account from scan, then add it to exceptions. Default is empty exception.

The output messages of collect_projects command has the following format:

{
  "project_key": "kubernetes",
  "vcs_instance_name": "GITHUB_PUBLIC",
}

4. Run the secret scan task:

This task reads the repositories from a RabbitMQ channel called 'repositories', runs scan using Gitleaks and saves the findings' metadata to database.

This can be done via the following command:

celery  -A  vcs_scanner.secret_scanners.celery_worker worker --loglevel=INFO -E -Q repositories --concurrency=1  --prefetch-multiplier=1

Run locally using docker

Preview Run the RESC VCS Scanner docker image locally by running the following commands:
  • Pull the docker image from registry:
docker pull rescabnamro/resc-vcs-scanner:latest
  • Alternatively, build the docker image locally by running:
docker build -t rescabnamro/resc-vcs-scanner:latest .
  • Run the vcs-scanner by using below command:
docker run -v <path to vcs_instances_config.json in your local system>:/tmp/vcs_instances_config.json -e RESC_RABBITMQ_SERVICE_HOST="host.docker.internal" -e RESC_RABBITMQ_SERVICE_PORT_AMQP=30902 -e RABBITMQ_DEFAULT_VHOST=resc-rabbitmq -e RABBITMQ_USERNAME=queue_user -e RABBITMQ_PASSWORD="<the password of queue_user>" -e RABBITMQ_QUEUE="repositories" -e RESC_API_NO_AUTH_SERVICE_HOST="host.docker.internal" -e RESC_API_NO_AUTH_SERVICE_PORT=30900 -e VCS_INSTANCES_FILE_PATH="/tmp/vcs_instances_config.json" -e GITHUB_PUBLIC_USERNAME="<your github username>" -e GITHUB_PUBLIC_TOKEN="<your github personal access token>" -e GITLEAKS_PATH="/vcs_scanner/gitleaks_config/seco-gitleaks-linux-amd64" --name resc-vcs-scanner rescabnamro/resc-vcs-scanner:latest celery  -A vcs_scanner.secret_scanners.celery_worker worker --loglevel=INFO -E -Q repositories --concurrency=1  --prefetch-multiplier=1

To create vcs_instances_config.json file please refer to: Structure of vcs_instances_config.json

Run locally as a CLI tool (Still in development)

Preview

It is also possible to run the component as a CLI tool to scan VCS repositories.

1. Create virtual environment:

cd components/resc-vcs-scanner
pip install virtualenv
virtualenv venv
source venv/bin/activate

2. Install resc_vcs_scanner package:

pip install -e .

3. Run CLI scanner:

The CLI has 3 modes of operation, please make use of the --help argument to see all the options for the modes:

  • Scanning a non-git directory:

    secret_scanner dir --help
    secret_scanner dir --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --dir=<directory to scan>
    
  • Scanning an already cloned git repository:

    secret_scanner repo local --help
    secret_scanner repo local --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --dir=<directory of repository to scan>
    
  • Scanning a remote git repository:

    secret_scanner repo remote --help
    secret_scanner repo remote --gitleaks-rules-path=<path to gitleaks toml rule> --gitleaks-path=<path to gitleaks binary> --ignored-blocker-path=<path to resc-ignore.dsv file> --repo-url=<url of repository to scan>
    

Most CLI arguments can also be provided by setting the corresponding environment variable. Please see the --help options on the arguments that can be provided using environment variables, and the expected environment variable names. These will always be prefixed with RESC_

Example: the argument --gitleaks-path can be provided using the environment variable RESC_GITLEAKS_PATH

Ignoring findings

Preview

It is possible to ignore some blocker findings (e.g. false positive) by providing a resc-ignore.dsv file. The bockers will be downgraded to a warning level and marked as ignored. Such file has the following structure:

# This is a comment
finding_path|finding_rule|finding_line_number|expiration_date
finding_path_2|finding_rule_2|finding_line_number_2
  • finding_path contains the path to the file with the blocking finding.
  • finding_rule contains the name of the blocking rule.
  • finding_line_number contains the line number of the finding.
  • expiration_date is optional, contains the date in ISO 8601 format until which this ignore rule should be considered valid.

For example, if we want to ignore the finding in file /etc/passwd for rule root_value_found on line 1 until April 1st 2024 at 23:59 the following line should be used.

/etc/passwd|root_value_found|1|2024-04-01T23:59:00

To ignore this finding ad vitam aeternam:

/etc/passwd|root_value_found|1

Testing

Run below commands to make sure that the unit tests are running and that the code matches quality standards:

Note: To run these tests you need to install tox. This can be done on Linux and Windows with Git Bash.

pip install tox      # install tox locally

tox -v -e sort       # Run this command to validate the import sorting
tox -v -e lint       # Run this command to lint the code according to this repository's standard
tox -v -e pytest     # Run this command to run the unit tests
tox -v               # Run this command to run all of the above tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resc_vcs_scanner-3.7.0.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

resc_vcs_scanner-3.7.0-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file resc_vcs_scanner-3.7.0.tar.gz.

File metadata

  • Download URL: resc_vcs_scanner-3.7.0.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for resc_vcs_scanner-3.7.0.tar.gz
Algorithm Hash digest
SHA256 e1b4b67195c1ceafab4f30ed420c25a9c981a8c7b2c59c68963b81f4b91d1dcc
MD5 190d2cd2366e681bb6d508bf75c47597
BLAKE2b-256 ef7b4a44dca067a25699be4a40220caaa5faf20f04543e84d5f91b6decbe87ed

See more details on using hashes here.

File details

Details for the file resc_vcs_scanner-3.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for resc_vcs_scanner-3.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 187a5cb191c4a6631b44125954f61aeceb8374c7c5fad7d4f8423405590e465a
MD5 c1c4d17c8d539893184294b66693edd3
BLAKE2b-256 a4f905c823d8d5a6f47d4d5ab385b566b17c893a32eea0f4e1aa9a76de29e1e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page