Skip to main content

JUX is a jax-accelerated engine for Lux-2022.

Project description

JUX

JUX is a Jax-accelerated game core for Lux AI Challenge Season 2, aimed to maximize game environment throughput for reinforcement learning (RL) training.

Installation

Install dependencies

One of the main dependencies is JAX, which in turn relies on NVCC, CUDA Toolkit and cuDNN. There are two ways to get them ready, either by conda or docker (recommended).

For conda users, you can install them with the following commands.

$ conda install -c nvidia cuda-nvcc cuda-python
$ conda install cudnn

For docker users, you can use the NVIDIA CUDA docker image or the PyTorch docker image, which has all of them ready and compatible with each other.

Install JUX

First, you need to clone the repository.

git clone https://github.com/RoboEden/jux.git
cd jux

Then, upgrade your pip and install JUX.

$ pip install --upgrade pip
$ pip install ./jux -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

For PyTorch users, you can install JUX with optional dependencies for PyTorch.

$ pip install ./jux[torch] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Usage

See tutorial.ipynb for a quick start. JUX is guaranteed to implement the same game logic as luxai2022==1.1.4, if players' input actions are valid. When players' input actions are invalid, JUX and LuxAI2022 may process them differently.

Performance

JUX maps all game logic to array operators in JAX so that we can harvest the computational power of modern GPUs and support tons of environments running in parallel. We benchmarked JUX on several different GPUs, and increased the throughput by hundreds to thousands of times, compared with the original single-thread Python implementation.

LuxAI2022 is a game with a dynamic number of units, making it hard to be accelerated by JAX, because jax.jit() only supports arrays with static shapes. As a workaround, we allocate a large buffer with static length to store units. The buffer length (buf_cfg.MAX_N_UNITS) greatly affects the performance. Theoretically, no player can build more than 1500 units under current game configs, so MAX_N_UNITS=1500 is a safe choice. However, we found that no player builds more than 200 units by watching game replays, so MAX_N_UNITS=200 is a practical choice.

Relative Throughput

Here, we report the relative throughput over the original Python implementation (luxai2022==1.1.3), on several different GPUs with different MAX_N_UNITS settings. The original single-thread Python implementation running on an 8255C CPU serves as the baseline. We can observe that the throughput is proportional to GPU memory bandwidth because the game logic is mostly memory-bound, not compute-bound. Byte-access operators take a large portion of the game logic in JUX implementation.

Relative Throughput
GPU GPU Mem. Bandwidth Batch Size UNITS=100 UNITS=200 UNITS=400 UNITS=600 UNITS=800 UNITS=1000
A100-SXM4-40GB 1555 GB/s 20k 1166x 985x 748x 598x 508x 437x
Tesla V100-SXM2-32GB 900 GB/s 20k 783x 647x 480x 375x 317x 269x
Tesla T4 320 GB/s 10k 263x 217x 160x 125x 105x 89x
GTX 1660 Ti 288 GB/s 3k 218x 178x 130x 103x 84x 71x
Batch Size Relative Throughput
Intel® Core™ i7-12700 1 2.12x
Intel® Xeon® Platinum 8255C CPU @ 2.50GHz 1 1.00x
Intel® Xeon® Gold 6133 @ 2.50GHz 1 0.89x

Absolute Throughput

We also report absolute throughput in steps per second here.

Throughput (steps/s)
GPU Batch Size UNITS=100 UNITS=200 UNITS=400 UNITS=600 UNITS=800 UNITS=1000
A100-SXM4-40GB 20k 363k 307k 233k 186k 158k 136k
Tesla V100-SXM2-32GB 20k 244k 202k 149k 117k 99k 84k
Tesla T4 10k 82k 68k 50k 39k 33k 28k
GTX 1660 Ti 3k 68k 56k 40k 32k 26k 22k
Batch Size Throughput (steps/s)
Intel® Core™ i7-12700 1 0.66k
Intel® Xeon® Platinum 8255C CPU @ 2.50GHz 1 0.31k
Intel® Xeon® Gold 6133 @ 2.50GHz 1 0.28k

Contributing

If you find any bugs or have any suggestions, please feel free to open an issue or a pull request. See CONTRIBUTING.md for setting up a developing environment.

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juxai2022-1.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

juxai2022-1.0-py3-none-any.whl (57.3 kB view details)

Uploaded Python 3

File details

Details for the file juxai2022-1.0.tar.gz.

File metadata

  • Download URL: juxai2022-1.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for juxai2022-1.0.tar.gz
Algorithm Hash digest
SHA256 03fd266a04d172071d7fa2103d90a26d86319c07cf018bcda48634105ed5cfac
MD5 fcc372ccb57ad75644ba00f0ea3d3b36
BLAKE2b-256 2d1fad1cf1e16f346b7fb894d91d44f16d87ac5d56df36e2215d1fd043ec1ab9

See more details on using hashes here.

File details

Details for the file juxai2022-1.0-py3-none-any.whl.

File metadata

  • Download URL: juxai2022-1.0-py3-none-any.whl
  • Upload date:
  • Size: 57.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for juxai2022-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e8d58251c04fe1560fa668a3506172fc892d8889ed8b373082ec08e732bfa72
MD5 e93ba23ade0363f1ef8a15774f281d13
BLAKE2b-256 cb9d70e27e6320775d6568ba539e05597a4c51572290063204a65814b9b10bbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page