Add your description here

Project description

RedCodeGen

Automatic generation of benign prompts and language model rollouts in Python that exercise specific software vulnerabilities (CWEs) defined in the MITRE CWE database.

Developed by the Stanford Intelligent Systems Laboratory (SISL) as a part of astra-rl.

Features

Generation of realistic coding task prompts that exercise specific CWEs
Generation of code samples for specific CWEs or CWE Top 25
Automatic code evaluation and vulnerability detection via CodeQL static analysis
Programmable API for custom scenarios and configurations

Installation

CodeQL

First, you must install CodeQL and have it available in your PATH.

macOS Users: brew install codeql
Windows/Linux Users: follow the instructions here

RedCodeGen

RedCodeGen is available via PyPI. Install it with pip:

pip install redcodegen

You would also want to create a .env file with your API key in your working directory:

echo "OPENAI_API_KEY=your_openai_api_key" > .env

Quick Start

The most basic usage involves rolling out a language model to generate code samples for specific CWEs and evaluating them with CodeQL.

Suppose you want to rollout 5 samples each to exercise CWE-89 (SQL Injection) and CWE-79 (Cross-Site Scripting):

redcodegen generate -c 89 -c 79 -n 5 -o results.jsonl

You will get a results.jsonl file with the generated samples and their evaluations. Each CWE will live on a line. Let's take a peak!

head -n 1 results.jsonl | jq .

{
  "cwe_id": 89,
  "cwe_name": "SQL Injection",
  "cwe_description": "SQL Injection is a code injection technique that might destroy your database. It is one of the most common web hacking techniques.",
  "timestamp": "2024-06-01T12:00:00Z",
  "model_config": {"model": "openai/gpt-4o-mini"},
  "min_scenarios": 5,
  "samples": [
    {
      "scenario": "A web application that takes user input and constructs SQL queries with proper sanitization.",
      "code": "...generated code here...",
      "evaluation": [
        "rule": "py/sql-injection",
        "message": "...",
        "line": ...
      ]
    },
    ...
  ]
}

Importantly, running the above command multiple times (to the same output file) will resume from where you left off, skipping CWEs that have already been processed in the output file.

Usage Examples

redcodegen generate -c 89 -c 79 # manually specify cwe
redcodegen generate -n 5 # specify number of rollouts
redcodegen generate --use-top-25 # run CWE top 25
redcodegen generate --use-top-25 -o results.jsonl # resume existing run
redcodegen generate --use-top-25 --model openai/gpt-4o # switch model

Also, you can run

redcodegen --help

to see all available options.

Method

RedCodeGen works in three main steps:

Prompt Generation: for each specified CWE, RedCodeGen generates a realistic coding task prompt that is likely to exercise the vulnerability. We do this by first looking up the CWE description from the MITRE CWE database, then prompting your specified language model to generate a coding task prompt based on that description. These descriptions are few-shot trained via existing human-written prompts from Pearce, 2021.
Code Generation: RedCodeGen then rolls out the specified language model on the generated prompt a few times with a sampling temperature of 0.8 to generate multiple code samples.
Code Evaluation: Finally, RedCodeGen evaluates each generated code sample using CodeQL static analysis to detect whether the intended vulnerability is present in the code.

Acknowledgements

We thank the Schmidt Sciences Foundation's trustworthy AI agenda for supporting this work.

Project details

Release history Release notifications | RSS feed

0.3.0

Mar 15, 2026

0.2.0

Nov 19, 2025

0.1.2

Nov 11, 2025

0.1.1

Nov 11, 2025

0.1.0

Nov 11, 2025

This version

0.1.0b0 pre-release

Nov 11, 2025

0.0.5

Oct 28, 2025

0.0.4

Oct 22, 2025

0.0.3

Oct 22, 2025

0.0.2

Oct 22, 2025

0.0.1

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redcodegen-0.1.0b0.tar.gz (23.4 kB view details)

Uploaded Nov 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

redcodegen-0.1.0b0-py3-none-any.whl (28.6 kB view details)

Uploaded Nov 11, 2025 Python 3

File details

Details for the file redcodegen-0.1.0b0.tar.gz.

File metadata

Download URL: redcodegen-0.1.0b0.tar.gz
Upload date: Nov 11, 2025
Size: 23.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for redcodegen-0.1.0b0.tar.gz
Algorithm	Hash digest
SHA256	`6cbb0d22138942b3152cea95a6e5368adbd8e87dceaea1be5c9ab06a52f877bb`
MD5	`3e900e7686078310375015e6a9da53cf`
BLAKE2b-256	`0ab884e3706739da08e31a99d5aebcdcd31a9d9d5f6d77a66b681d39db7d848f`

See more details on using hashes here.

File details

Details for the file redcodegen-0.1.0b0-py3-none-any.whl.

File metadata

Download URL: redcodegen-0.1.0b0-py3-none-any.whl
Upload date: Nov 11, 2025
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for redcodegen-0.1.0b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f844c0e0a8ce82c71613a1929d5061a675240334aacfc9407c4c743c2bc647ff`
MD5	`25527b7cb4ba8ed3080789f6782ab6eb`
BLAKE2b-256	`64b7ef28d7ac9eca86a9a07f668938a4e9b113d1f264e4353234894265d49752`

See more details on using hashes here.

redcodegen 0.1.0b0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

RedCodeGen

Features

Installation

CodeQL

RedCodeGen

Quick Start

Usage Examples

Method

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes