Skip to main content

NYU CTF Dataset loader package

Project description

NYU CTF Bench

This repository hosts the NYU CTF Bench, a collection of CTF challenges from the CSAW CTF competitions, designed for evaluation of LLM agents. The CTF challenges are dockerized and easily deployable to allow an LLM-based automation framework to interact with the challenge and attempt a solution. The main benchmark dataset contains 200 challenges across 6 CTF categories: web, binary exploitation (pwn), forensics, reverse engineering (rev), cryptography (crypto), and miscellaneous (misc).

Benchmark structure

The test/ folder contains the main benchmark dataset of 200 challenges. A smaller development set of 55 challenges is present in the development/ folder. The development set can be treated equivalent to a "train" split and used for building the agent, so that design decisions made to improve the agent do not bias the test scores.

The folder structure is as follows: <year>/<event>/<category>/<challenge>. <year> is the year of the competition, <event> is either "CSAW-Quals" or "CSAW-Finals", <category> is among the 6 categories, and <challenge> is the challenge name. Note that the challenge name may have spaces and single-quotes, so it is advisable to wrap it in double-quotes when using in scripts.

Each challenge contains a challenge.json containing the metadata of the challenge, and the corresponding challenge files. Challenges that require a server to host some challenge files are set up with a docker image, and a docker-compose.yaml file. The docker image is loaded directly using docker compose up.

Setup

Install the python package:

pip install nyuctf

The repository is automatically cloned when the CTFDataset is first instantiated with the split argument. If needed, you can manually clone it by running:

python3 -m nyuctf.download

Usage

The following python snippet shows how to load challenge details using the python module:

from nyuctf.dataset import CTFDataset
from nyuctf.challenge import CTFChallenge

# Clones the repository for the first time, which takes a while
ds = CTFDataset(split="test")
chal = CTFChallenge(ds.get("2021f-rev-maze"), ds.basedir)

print(chal.name)
print(chal.flag)
print(chal.files)

Tests

Run tests on the challenges, for docker setup and network connection. Requires the docker network to be setup.

cd python
python -m unittest -v test.test_challenges

Optionally filter the tests with the unittest -k flag.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyuctf-1.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyuctf-1.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file nyuctf-1.1.tar.gz.

File metadata

  • Download URL: nyuctf-1.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for nyuctf-1.1.tar.gz
Algorithm Hash digest
SHA256 37c9c54c503a16732816e9c7f4481d88712b865025082446ad57fd32c04fe553
MD5 8b5165b874d28df3c176ac638616e7e4
BLAKE2b-256 ab4d8bcea6bc03dff0399786353217221d5d602d88ee138972f077e093fa3150

See more details on using hashes here.

File details

Details for the file nyuctf-1.1-py3-none-any.whl.

File metadata

  • Download URL: nyuctf-1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for nyuctf-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f66c4667fb7ac34b2b2c3703c75233e59f9bd68ea2fab8ce98f3e0d83499bc2d
MD5 286405844b4c8f14a72323ed64f98b1b
BLAKE2b-256 d2caaab57ec6e997e26d0fa626f682e13a9ab12cd380457020af6ebcdd75fd29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page