Skip to main content

Add your description here

Project description

Craftax LM

A wrapper around the Craftax agent benchmark, for evaluating digital agents.

Usage

First, download the package with pip install craftaxlm. Next, import the agent-computer interface of your choice via

from craftaxlm import CraftaxACI, CraftaxClassicACI

This package is early in development, so for implementation examples, please refer to the baseline ReAct implementation

Leaderboard

Craftax-Classic

LM Algorithm Reward (% max) Code
gpt-4o-mini ReAct 18.4 CraftaxLM_Baselines

Craftax-Full

LM Algorithm Reward (% max) Code
gpt-4o-mini ReAct 02.9 CraftaxLM_Baselines

Dev Instructions

pyenv virtualenv craftax_env
poetry install

When in doubt

from jax import debug
...
debug.breakpoint()

📚 Citation

To learn more about Craftax, check out the paper website here. To cite the underlying Craftax environment, see:

@inproceedings{matthews2024craftax,
    author={Michael Matthews and Michael Beukman and Benjamin Ellis and Mikayel Samvelyan and Matthew Jackson and Samuel Coward and Jakob Foerster},
    title = {Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning},
    booktitle = {International Conference on Machine Learning ({ICML})},
    year = {2024}
}

To cite the Crafter benchmark, see:

@article{hafner2021crafter,
  title={Benchmarking the Spectrum of Agent Capabilities},
  author={Danijar Hafner},
  year={2021},
  journal={arXiv preprint arXiv:2109.06780},
}

Contributing

uv venv craftaxlm-dev
source craftaxlm-dev/bin/activate
uv run ruff format .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

craftaxlm-0.0.3.tar.gz (64.6 kB view hashes)

Uploaded Source

Built Distribution

craftaxlm-0.0.3-py3-none-any.whl (17.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page