Skip to main content

Generate CPG for multiple languages for use with joern

Project description

CPG Generator

 ██████╗██████╗  ██████╗
██╔════╝██╔══██╗██╔════╝
██║     ██████╔╝██║  ███╗
██║     ██╔═══╝ ██║   ██║
╚██████╗██║     ╚██████╔╝
 ╚═════╝╚═╝      ╚═════╝

CPG Generator is a python cli tool to generate Code Property Graph for multiple languages. The generated CPG can be directly imported to Joern or uploaded to Qwiet.AI for analysis.

Pre-requisites

  • JDK 11 or above
  • Python 3.10
  • Docker or podman (Windows, Linux or Mac) or
  • Joern natively installed (Linux only)

Installation

cpggen is available as a single executable binary, PyPI package or as a container image.

Single executable binaries

Download the executable binary for your operating system from the releases page. These binary bundle the following:

  • Joern with all the CPG frontends
  • cpggen with Python 3.10
  • cdxgen with Node.js 18 - Generates SBoM
curl -LO https://github.com/AppThreat/cpggen/releases/download/v1.0.4/cpggen-linux-amd64
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help

On Windows,

curl -LO https://github.com/appthreat/cpggen/releases/download/v1.0.4/cpggen.exe
.\cpggen.exe --help

NOTE: On Windows, antivirus and antimalware could prevent this single executable from functioning properly. Depending on the system, administrative privileges might also be required. Use container-based execution as a fallback.

OCI Artifacts via ORAS cli

Use ORAS cli to download the cpggen binary on Linux and Windows.

VERSION="1.0.0"
curl -LO "https://github.com/oras-project/oras/releases/download/v${VERSION}/oras_${VERSION}_linux_amd64.tar.gz"
mkdir -p oras-install/
tar -zxf oras_${VERSION}_*.tar.gz -C oras-install/
sudo mv oras-install/oras /usr/local/bin/
rm -rf oras_${VERSION}_*.tar.gz oras-install/
oras pull ghcr.io/appthreat/cpggen-bin:v1
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help

On Windows

set VERSION="1.0.0"
curl.exe -sLO  "https://github.com/oras-project/oras/releases/download/v%VERSION%/oras_%VERSION%_windows_amd64.zip"
tar.exe -xvzf oras_%VERSION%_windows_amd64.zip
mkdir -p %USERPROFILE%\bin\
copy oras.exe %USERPROFILE%\bin\
set PATH=%USERPROFILE%\bin\;%PATH%
Invoke-WebRequest -Uri https://github.com/oras-project/oras/releases/download/v1.0.0/oras_1.0.0_windows_amd64.zip -UseBasicParsing -OutFile oras_1.0.0_windows_amd64.zip
Expand-Archive -Path oras_1.0.0_windows_amd64.zip -DestinationPath .
oras.exe pull ghcr.io/appthreat/cpggen-windows-bin:v1

PyPI package

This would install just the python cli tool without any CPG language frontends. Joern must be installed separately to make the cli work.

pip install cpggen

Bundled container image

docker pull ghcr.io/appthreat/cpggen
# podman pull ghcr.io/appthreat/cpggen

Almalinux 9 requires the CPU to support SSE4.2. For kvm64 VM use the Almalinux 8 version instead.

docker pull ghcr.io/appthreat/cpggen-alma8
# podman pull ghcr.io/appthreat/cpggen-alma8

Or use the nightly to always get the latest joern and tools.

docker pull ghcr.io/appthreat/cpggen:nightly
# podman pull ghcr.io/appthreat/cpggen:nightly

Usage

To auto detect the language from the current directory and generate CPG.

cpggen

To specify input and output directory.

cpggen -i <src directory> -o <CPG directory or file name>

You can even pass a git url as source

cpggen -i https://github.com/HooliCorp/vulnerable-aws-koa-app -o /tmp/cpg

To specify language type.

cpggen -i <src directory> -o <CPG directory or file name> -l java

# Comma separated values are accepted for multiple languages
cpggen -i <src directory> -o <CPG directory or file name> -l java,js,python

Container based invocation

docker run --rm -it -v /tmp:/tmp -v $(pwd):/app:rw --cpus=4 --memory=16g -t ghcr.io/appthreat/cpggen cpggen -i <src directory> -o <CPG directory or file name>

Export graphs

By passing --export, cpggen can export the various graphs to many formats using joern-export

Example to export cpg14 graphs in dot format

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export

To export pdg in neo4jcsv format

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export --export-repr pdg --export-format neo4jcsv

Slicing graphs

Pass --slice argument to extract intra-procedural slices from the CPG. By default, slices would be based on Usages. Pass --slice-mode DataFlow to create a sliced CPG based on DataFlow.

cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --slice

Artifacts produced

Upon successful completion, cpggen would produce the following artifacts in the directory specified under out_dir

  • {name}-{lang}-cpg.bin.zip - Code Property Graph for the given language type
  • {name}-{lang}-cpg.bom.xml - SBoM in CycloneDX XML format
  • {name}-{lang}-cpg.bom.json - SBoM in CycloneDX json format
  • {name}-{lang}-cpg.manifest.json - A json file listing the generated artifacts and the invocation commands

Server mode

cpggen can run in server mode.

cpggen --server

You can invoke the endpoint /cpg to generate CPG.

curl "http://127.0.0.1:7072/cpg?src=/Volumes/Work/sandbox/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"
curl "http://127.0.0.1:7072/cpg?url=https://github.com/HooliCorp/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"

Languages supported

Language Requires build
C No
C++ No
Java No (*)
Scala Yes
Jsp Yes
Jar/War No
JavaScript No
TypeScript No
Kotlin No (*)
Php No
Python No
C# / dotnet Yes
Go Yes

(*) - Precision could be improved with dependencies

Environment variables

Name Purpose
JOERN_HOME Joern installation directory
CPGGEN_HOST cpggen server host. Default 127.0.0.1
CPGGEN_PORT cpggen server port. Default 7072
CPGGEN_CONTAINER_CPU CPU units to use in container execution mode. Default computed
CPGGEN_CONTAINER_MEMORY Memory units to use in container execution mode. Default computed
CPGGEN_MEMORY Heap memory to use for frontends. Default computed
AT_DEBUG_MODE Set to debug to enable debug logging
CPG_EXPORT Set to true to export CPG graphs in dot format
CPG_EXPORT_REPR Graph to export. Default all
CPG_EXPORT_FORMAT Export format. Default dot
CPG_SLICE Set to true to slice CPG
CPG_SLICE_MODE Slice mode. Default Usages
SHIFTLEFT_ACCESS_TOKEN Set to automatically submit the CPG for analysis by Qwiet AI

GitHub actions

Use the marketplace action to generate CPGs using GitHub actions. Optionally, the upload the generated CPGs as build artifacts use the below step.

- name: Upload cpg
  uses: actions/upload-artifact@v1.0.0
  with:
    name: cpg
    path: cpg_out

License

Apache-2.0

Developing / Contributing

git clone git@github.com:AppThreat/cpggen.git
cd cpggen

python -m pip install --upgrade pip
python -m pip install poetry
# Add poetry to the PATH environment variable
poetry install

poetry run cpggen -i <src directory>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpggen-1.0.4.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

cpggen-1.0.4-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file cpggen-1.0.4.tar.gz.

File metadata

  • Download URL: cpggen-1.0.4.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1036-azure

File hashes

Hashes for cpggen-1.0.4.tar.gz
Algorithm Hash digest
SHA256 97ff2e654de676254a3a2691c6d60c262c98805dcf47c403377a0ca18da3a103
MD5 6c4f3780d058d1c2031a1e2a1c0cd052
BLAKE2b-256 a1d64309fe5ace2be37dad13f2667d90a1463f9f2c778beb3fba49c71d9fb46f

See more details on using hashes here.

File details

Details for the file cpggen-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: cpggen-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1036-azure

File hashes

Hashes for cpggen-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5d13a46e95dd53bd4bd4cd3ea17475c80479ef1f83e9c811dffcf3e3ccd82dba
MD5 5348ca76bde5bd173cfdc7f249246f0c
BLAKE2b-256 8e90f2f63662d89c1a92a5fca95203756317be8e97afce875fa8e4f92c601bc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page