Skip to main content

This repository contains code for the DiscoGen procedural generator for algorithm discovery tasks.

Project description

DiscoGen: Procedural Generation of Algorithm Discovery Tasks in Machine Learning

DiscoGen Logo

This repository contains code for the DiscoGen modular benchmark for automated algorithm discovery.

Quick Start

Install DiscoGen:

pip install discogen

List available domains:

discogen get-domains

Create a task (the following will include a full implementation, and no editable modules):

discogen create-task --task-domain OnPolicyRL

Create an example task for running an agent (with an incomplete set of modules):

discogen create-task --task-domain OnPolicyRL --example

See the full documentation for detailed usage. Please note that each task_domain has its own set of requirements which may need to be installed, and these are often in conflict with the base DiscoGen requirements. This can be done using the install.sh script provided in each task folder.

Every domain includes references in discogen/domains/<task_domain>/utils/_reference.txt.

Task Domains

Task Domain Modules Datasets Description
BayesianOptimisation acq_fn, acq_optimizer, sampler, next_queries, surrogate, surrogate_optimizer Ackley1D, Ackley2D, Branin2d, Bukin2d, Cosine8d, DropWave2d, EggHolder2d, Griewank5d, Hartmann6d, HolderTable2d, Levy6d. Optimization of black-box functions using surrogate models to find global minima/maxima.
BrainSpeechDetection loss, networks, optim 7 LibriBrainSherlock tasks. Detecting speech features directly from brain activity data.
ComputerVisionClassification loss, networks, optim, preprocess CIFAR10, CIFAR10C, CIFAR10LT, CIFAR100, FashionMNIST, MNIST, OxfordFlowers, StanfordCars, TinyImageNet. Image classification on a range of datasets.
ContinualLearning optim, regularizer, replay, sampler, scheduler PermutedMNIST, SplitCIFAR100, TinyImageNetSplit. Training a model on continually changing data, such that it can adapt to new data without losing old capabilities.
GreenhouseGasPrediction data_processing, model 4 Mauna Loa Time-series (CO2, N2O, SF6, CH4). Time-series forecasting of atmospheric greenhouse gas concentrations.
LanguageModelling loss, networks, optim OPCFineWebCode, OPCFineWebMath, LMFineWeb, TinyStories. Training transformer-based models on code, mathematics, and narrative text.
ModelUnlearning loss MUSE, TOFU, WMDP_Cyber. Fine-tuning pretrained models to remove specific knowledge or data points while retaining others.
NeuralCellularAutomata loss, optimiser, perceive, train, update GrowingButterfly, GrowingLizard, MatrixOperations, MNISTInpainting, SelfClassifyingMNIST. Evolving neural cellular automata to different tasks based on reproduction and classification.
OfflineRL actor_loss, critic_loss, optim, networks, train 10 OGBench Training RL policies from offline datasets.
OffPolicyRL q_update, policy, networks, optim, rb, train, config 4 MinAtar. Value-based RL for training an agent in MinAtar.
OnPolicyMARL activation, loss, networks, optim, targets, train 5 MABrax, MPE Spread, 11 SMAX Training multiple on-policy RL agents in different multi-agent environments.
OnPolicyRL loss, networks, optim, train, activation, targets 4 MinAtar, 7 Brax, 2 Craftax. Training an RL agent in a range of different RL environments using PPO-style algorithms.
TrajectoryPrediction loss, networks, optim, train Argoverse2, nuScenes, Waymo. Training a model to predict trajectories in autonomous driving datasets.
UnsupervisedEnvironmentDesign sample_levels, train_step, variable_config 3 Kinetix sizes, Minigrid. Generating and curating training environments/levels to improve RL agent generalization.

Development Setup

1. Set Up Your Development Environment

Install the environment and the pre-commit hooks with:

make install

This will also generate your uv.lock file.

Contributing

We welcome contributions! DiscoGen grows stronger with more tasks and domains.

See CONTRIBUTING.md for detailed development guidelines.

Citation

If you use DiscoGen in your research, please cite:

@article{goldie2026discogen,
  title={DiscoGen: Procedural Generation of Algorithm Discovery Tasks in Machine Learning},
  author={Alexander D. Goldie and Zilin Wang and Adrian Hayler and Deepak Nathani and Edan Toledo and Ken Thampiratwong and Aleksandra Kalisz and Michael Beukman and Alistair Letcher and Shashank Reddy and Clarisse Wibault and Theo Wolf and Charles O'Neill and Uljad Berdica and Nicholas Roberts and Saeed Rahmani and Hannah Erlebach and Roberta Raileanu and Shimon Whiteson and Jakob N. Foerster},
  year={2026}
}

License

DiscoGen is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discogen-1.0.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

discogen-1.0.0-py3-none-any.whl (844.9 kB view details)

Uploaded Python 3

File details

Details for the file discogen-1.0.0.tar.gz.

File metadata

  • Download URL: discogen-1.0.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for discogen-1.0.0.tar.gz
Algorithm Hash digest
SHA256 be2c300a21e9c566e9433f9f07989f6ad6d5deca440ca260e1a3205655764b51
MD5 28ce59a0e9315f428720246c3d75047c
BLAKE2b-256 bac615900325580dd33c42f785b3981e3d2521d9705ecf064c75f1061ff409e4

See more details on using hashes here.

File details

Details for the file discogen-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: discogen-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 844.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for discogen-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f1eae9478afd4762609110aa2b6d82366a2914a738590e620079626817cb319
MD5 2d5341a26616ea16225dac1f5503165f
BLAKE2b-256 d6cb0687b9c9542e71d4790e55aad9418ddeaadc268ebcc5028869654d0bad02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page