Generate CPG for multiple languages for code and threat analysis
Project description
CPG Generator
██████╗██████╗ ██████╗
██╔════╝██╔══██╗██╔════╝
██║ ██████╔╝██║ ███╗
██║ ██╔═══╝ ██║ ██║
╚██████╗██║ ╚██████╔╝
╚═════╝╚═╝ ╚═════╝
CPG Generator is a python cli tool to generate Code Property Graph, a novel intermediate representation, for code and threat analysis. The generated CPG can be directly imported to Joern for analysis.
Pre-requisites
Installation
cpggen is available as a single executable binary, PyPI package or as a container image.
Single executable binaries
Download the executable binary for your operating system from the releases page. These binary bundle the following:
- Atom ⚛
- cpggen with Python 3.10
- cdxgen with Node.js 18 - Generates SBoM
curl -LO https://github.com/AppThreat/cpggen/releases/latest/download/cpggen-linux-amd64
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help
Atom based frontend.
curl -LO https://github.com/AppThreat/cpggen/releases/latest/download/atomgen
chmod +x atomgen
./atomgen --help
On Windows,
curl -LO https://github.com/appthreat/cpggen/releases/latest/download/cpggen.exe
.\cpggen.exe --help
NOTE: On Windows, antivirus and antimalware could prevent this single executable from functioning properly. Depending on the system, administrative privileges might also be required. Use container-based execution as a fallback.
OCI Artifacts via ORAS cli
Use ORAS cli to download the cpggen binary on Linux and Windows.
VERSION="1.0.0"
curl -LO "https://github.com/oras-project/oras/releases/download/v${VERSION}/oras_${VERSION}_linux_amd64.tar.gz"
mkdir -p oras-install/
tar -zxf oras_${VERSION}_*.tar.gz -C oras-install/
sudo mv oras-install/oras /usr/local/bin/
rm -rf oras_${VERSION}_*.tar.gz oras-install/
oras pull ghcr.io/appthreat/cpggen-bin:v1
chmod +x cpggen-linux-amd64
./cpggen-linux-amd64 --help
On Windows
set VERSION="1.0.0"
curl.exe -sLO "https://github.com/oras-project/oras/releases/download/v%VERSION%/oras_%VERSION%_windows_amd64.zip"
tar.exe -xvzf oras_%VERSION%_windows_amd64.zip
mkdir -p %USERPROFILE%\bin\
copy oras.exe %USERPROFILE%\bin\
set PATH=%USERPROFILE%\bin\;%PATH%
Invoke-WebRequest -Uri https://github.com/oras-project/oras/releases/download/v1.0.0/oras_1.0.0_windows_amd64.zip -UseBasicParsing -OutFile oras_1.0.0_windows_amd64.zip
Expand-Archive -Path oras_1.0.0_windows_amd64.zip -DestinationPath .
oras.exe pull ghcr.io/appthreat/cpggen-windows-bin:v1
PyPI package
This would install the python cli tool with bundled atom distribution.
pip install cpggen
With atom, CPG can be generated for the following languages:
- C/C++
- Java
- Jars
- JavaScript/TypeScript
- Python
Install joern and set the JOERN_HOME
environment variable if you would like support for additional languages and binaries.
Bundled container image
docker pull ghcr.io/appthreat/cpggen
# podman pull ghcr.io/appthreat/cpggen
Use the AWS Public ECR mirror for those T days when ghcr becomes unavailable.
docker pull public.ecr.aws/appthreat/cpggen:latest
# podman pull public.ecr.aws/appthreat/cpggen:latest
Almalinux 9 requires the CPU to support SSE4.2. For kvm64 VM use the Almalinux 8 version instead.
docker pull ghcr.io/appthreat/cpggen-alma8
# podman pull ghcr.io/appthreat/cpggen-alma8
Or use the nightly to always get the latest joern and tools.
docker pull ghcr.io/appthreat/cpggen:nightly
# podman pull ghcr.io/appthreat/cpggen:nightly
Finally, a slimmer image based on atom distribution.
docker pull ghcr.io/appthreat/atomgen
# podman pull ghcr.io/appthreat/atomgen
Usage
To auto detect the language from the current directory and generate CPG.
cpggen
To specify input and output directory.
cpggen -i <src directory> -o <CPG directory or file name>
You can even pass a git or a package url or CVE id as source
cpggen -i https://github.com/HooliCorp/vulnerable-aws-koa-app -o /tmp/cpg
cpggen -i "pkg:maven/org.apache.commons/commons-io@1.3.2" -o /tmp/cpg
export GITHUB_TOKEN=<token with read:packages scope>
cpggen -i CVE-2023-32681 -o /tmp/cpg
cpggen -i GHSA-j8r2-6x86-q33q -o /tmp/cpg
To specify language type.
cpggen -i <src directory> -o <CPG directory or file name> -l java
# Comma separated values are accepted for multiple languages
cpggen -i <src directory> -o <CPG directory or file name> -l java,js,python
Container-based invocation
docker run --rm -it -v /tmp:/tmp -v $(pwd):/app:rw --cpus=4 --memory=16g -t ghcr.io/appthreat/cpggen cpggen -i <src directory> -o <CPG directory or file name>
Export graphs
By passing --export
, cpggen can export the various graphs to many formats using joern-export
Example to export cpg14
graphs in dot
format
cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export
To export cpg
in neo4jcsv
format
cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --build --export --export-out-dir ~/work/sandbox/crAPI/cpg_export --export-repr cpg --export-format neo4jcsv
Slicing graphs
Pass --slice
argument to extract intra-procedural slices from the CPG. By default, slices would be based on Usages
. Pass --slice-mode DataFlow
to create a sliced CPG based on DataFlow
.
cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --slice
Creating vectors
Pass --vectors
argument to extract vector representations of code from CPG in json format.
cpggen -i ~/work/sandbox/crAPI -o ~/work/sandbox/crAPI/cpg_out --vectors
Artifacts produced
Upon successful completion, cpggen would produce the following artifacts in the directory specified under out_dir
- {name}-{lang}.⚛ - Atom representation for the given language. Requires the use of atomgen container image or the cli argument
--use-atom
- {name}-{lang}.cpg.bin - Code Property Graph for the given language type
- {name}-{lang}.bom.json - SBoM in CycloneDX json format. Requires the environment variable
ENABLE_SBOM
to be set totrue
- {name}-{lang}.manifest.json - A json file listing the generated artifacts and the invocation commands
Server mode
cpggen can run in server mode.
cpggen --server
You can invoke the endpoint /cpg
to generate CPG from a path, http or package url. Parameters can be passed using GET or POST request.
curl "http://127.0.0.1:7072/cpg?src=/Volumes/Work/sandbox/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"
curl "http://127.0.0.1:7072/cpg?url=https://github.com/HooliCorp/vulnerable-aws-koa-app&out_dir=/tmp/cpg_out&lang=js"
Package url with slicing.
curl "http://127.0.0.1:7072/cpg?url=pkg:maven/org.apache.commons/commons-io@1.3.2&out_dir=/tmp/cpg_out&slice=true"
Languages supported
Language | Requires build | Maturity |
---|---|---|
C | No | High |
C++ | No | High |
Java | No (*) | Medium |
Scala | Yes | High |
JavaScript | No | Medium |
TypeScript | No | Medium |
Kotlin | No (*) | Low |
Php | No | Low |
Python | No | Low |
(*) - Precision could be improved with dependencies
EXPERIMENTAL: Use the provided protobuf bindings to build new language frontends.
Full list of options
cpggen --help
usage: cpggen [-h] [-i SRC] [-o CPG_OUT_DIR] [-l LANGUAGE] [--use-container] [--build] [--joern-home JOERN_HOME] [--server] [--server-host SERVER_HOST] [--server-port SERVER_PORT] [--export]
[--export-repr {ast,cfg,cdg,ddg,pdg,cpg,cpg14,all}] [--export-format {neo4jcsv,graphml,graphson,dot}] [--export-out-dir EXPORT_OUT_DIR] [--verbose] [--skip-sbom] [--slice] [--slice-mode {Usages,DataFlow}] [--use-parse]
CPG Generator
optional arguments:
-h, --help show this help message and exit
-i SRC, --src SRC Source directory or url or CVE or GHSA id
-o CPG_OUT_DIR, --out-dir CPG_OUT_DIR
CPG output directory
-l LANGUAGE, --lang LANGUAGE
Optional. CPG language frontend to use. Auto-detects by default.
--use-container Use cpggen docker image
--build Attempt to build the project automatically
--joern-home JOERN_HOME
Joern installation directory
--server Run cpggen as a server
--server-host SERVER_HOST
cpggen server host
--server-port SERVER_PORT
cpggen server port
--export Export CPG as a graph
--export-repr {ast,cfg,cdg,ddg,pdg,cpg,cpg14,all}
Graph representation to export
--export-format {neo4jcsv,graphml,graphson,dot}
Export format
--export-out-dir EXPORT_OUT_DIR
Export output directory
--verbose Run cpggen in verbose mode
--skip-sbom Do not generate SBoM
--slice Extract intra-procedural slices from the CPG
--slice-mode {Usages,DataFlow}
Mode used for CPG slicing
--use-atom Use atom toolkit
--vectors Extract vector representations of code from CPG
Environment variables
Name | Purpose |
---|---|
JOERN_HOME | Optional when using atom. Joern installation directory |
CPGGEN_HOST | cpggen server host. Default 127.0.0.1 |
CPGGEN_PORT | cpggen server port. Default 7072 |
CPGGEN_CONTAINER_CPU | CPU units to use in container execution mode. Default computed |
CPGGEN_CONTAINER_MEMORY | Memory units to use in container execution mode. Default computed |
CPGGEN_MEMORY | Heap memory to use for frontends. Default computed |
AT_DEBUG_MODE | Set to debug to enable debug logging |
CPG_EXPORT | Set to true to export CPG graphs in dot format |
CPG_EXPORT_REPR | Graph to export. Default all |
CPG_EXPORT_FORMAT | Export format. Default dot |
CPG_SLICE | Set to true to slice CPG |
CPG_SLICE_MODE | Slice mode. Default Usages |
CPG_VECTORS | Set to true to generate vector representations of code from CPG |
CDXGEN_ARGS | Extra arguments to pass to cdxgen |
ENABLE_SBOM | Enable SBoM generation using cdxgen |
JIMPLE_ANDROID_JAR | Optional when using atom. Path to android.jar for use with jimple for .apk or .dex to CPG conversion |
GITHUB_TOKEN | Token with read:packages scope to analyze CVE or GitHub Advisory |
USE_ATOM | Use AppThreat atom instead of joern frontends. atomgen would default to this mode. |
GitHub actions
Use the marketplace action to generate CPGs using GitHub actions. Optionally, the upload the generated CPGs as build artifacts use the below step.
- name: Upload cpg
uses: actions/upload-artifact@v1.0.0
with:
name: cpg
path: cpg_out
License
Apache-2.0
Developing / Contributing
git clone git@github.com:AppThreat/cpggen.git
cd cpggen
python -m pip install --upgrade pip
python -m pip install poetry
# Add poetry to the PATH environment variable
poetry install
poetry run cpggen -i <src directory>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cpggen-1.9.0.tar.gz
.
File metadata
- Download URL: cpggen-1.9.0.tar.gz
- Upload date:
- Size: 73.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b89130a6e650c8d2537bcb2b2e6d5037a69affa741eb5fffbdd3748867d0cfa |
|
MD5 | 68eb221181c53b342ff3d01254eee8a4 |
|
BLAKE2b-256 | 91528a01b535080f55ce9543e4fe975421f3325bc9557a392e0827a3ea3f531e |
File details
Details for the file cpggen-1.9.0-py3-none-any.whl
.
File metadata
- Download URL: cpggen-1.9.0-py3-none-any.whl
- Upload date:
- Size: 73.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.12 Linux/5.15.0-1041-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64a0b1921276fd324b348270994343da0a2cd19f06abeabb444429f12c3a7eae |
|
MD5 | c1ffe157f9e7d6361bd20ed487641f96 |
|
BLAKE2b-256 | b9448b503d7d500ce8a8cec8449b6d6bc77293f7b350dfa6a67d73ec99e10890 |