Skip to main content

NYU CTF Dataset loader package

Project description

NYU CTF Bench

This repository hosts the NYU CTF Bench, a collection of CTF challenges from the CSAW CTF competitions, designed for evaluation of LLM agents. The CTF challenges are dockerized and easily deployable to allow an LLM-based automation framework to interact with the challenge and attempt a solution. The main benchmark dataset contains 200 challenges across 6 CTF categories: web, binary exploitation (pwn), forensics, reverse engineering (rev), cryptography (crypto), and miscellaneous (misc).

Benchmark structure

The test/ folder contains the main benchmark dataset of 200 challenges. A smaller development set of 55 challenges is present in the development/ folder. The development set can be treated equivalent to a "train" split and used for building the agent, so that design decisions made to improve the agent do not bias the test scores.

The folder structure is as follows: <year>/<event>/<category>/<challenge>. <year> is the year of the competition, <event> is either "CSAW-Quals" or "CSAW-Finals", <category> is among the 6 categories, and <challenge> is the challenge name. Note that the challenge name may have spaces and single-quotes, so it is advisable to wrap it in double-quotes when using in scripts.

Each challenge contains a challenge.json containing the metadata of the challenge, and the corresponding challenge files. Challenges that require a server to host some challenge files are set up with a docker image, and a docker-compose.yaml file. The docker image is loaded directly using docker compose up.

Setup

Install the python package:

pip install nyuctf

The repository is automatically cloned when the CTFDataset is first instantiated with the split argument. If needed, you can manually clone it by running:

python3 -m nyuctf.download

Usage

The following python snippet shows how to load challenge details using the python module:

from nyuctf.dataset import CTFDataset
from nyuctf.challenge import CTFChallenge

# Clones the repository for the first time, which takes a while
ds = CTFDataset(split="test")
chal = CTFChallenge(ds.get("2021f-rev-maze"), ds.basedir)

print(chal.name)
print(chal.flag)
print(chal.files)

Tests

Run tests on the challenges, for docker setup and network connection. Requires the docker network to be setup.

cd python
python -m unittest -v test.test_challenges

Optionally filter the tests with the unittest -k flag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyuctf-1.1.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyuctf-1.1.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file nyuctf-1.1.1.tar.gz.

File metadata

  • Download URL: nyuctf-1.1.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for nyuctf-1.1.1.tar.gz
Algorithm Hash digest
SHA256 d2de54baff29e4b1f1a2030997e5527fa441ea80384e8c7dc47bab1ba91fe98d
MD5 849dfa86230e6e81892f06de13a23aeb
BLAKE2b-256 7d8764f88a445385a2c04e338912c96173bdb3c4e771a9bffa2df719fc5d97cb

See more details on using hashes here.

File details

Details for the file nyuctf-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: nyuctf-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for nyuctf-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fdba1fbaa0c5fb396b134a82071569595d2e9123570ba6b872bb8616bd0d5db1
MD5 e139eaaf52b1af0c175b5f56516e815f
BLAKE2b-256 cd0a1cdc94e0c8c4a44efcd6966726db6f4693dbce9fef81c552be94c73fcb2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page