JUX is a jax-accelerated engine for Lux-2022.
Project description
JUX
JUX is a Jax-accelerated game core for Lux AI Challenge Season 2, aimed to maximize game environment throughput for reinforcement learning (RL) training.
Installation
Install dependencies
One of the main dependencies is JAX, which in turn relies on NVCC, CUDA Toolkit and cuDNN. There are two ways to get them ready, either by conda or docker (recommended).
For conda users, you can install them with the following commands.
$ conda install -c nvidia cuda-nvcc cuda-python
$ conda install cudnn
For docker users, you can use the NVIDIA CUDA docker image or the PyTorch docker image, which has all of them ready and compatible with each other.
Install JUX
First, you need to clone the repository.
git clone https://github.com/RoboEden/jux.git
cd jux
Then, upgrade your pip and install JUX.
$ pip install --upgrade pip
$ pip install ./jux -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
For PyTorch users, you can install JUX with optional dependencies for PyTorch.
$ pip install ./jux[torch] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Usage
See tutorial.ipynb for a quick start. JUX is guaranteed to implement the same game logic as luxai2022==1.1.4
, if players' input actions are valid. When players' input actions are invalid, JUX and LuxAI2022 may process them differently.
Performance
JUX maps all game logic to array operators in JAX so that we can harvest the computational power of modern GPUs and support tons of environments running in parallel. We benchmarked JUX on several different GPUs, and increased the throughput by hundreds to thousands of times, compared with the original single-thread Python implementation.
LuxAI2022 is a game with a dynamic number of units, making it hard to be accelerated by JAX, because jax.jit()
only supports arrays with static shapes. As a workaround, we allocate a large buffer with static length to store units. The buffer length (buf_cfg.MAX_N_UNITS
) greatly affects the performance. Theoretically, no player can build more than 1500 units under current game configs, so MAX_N_UNITS=1500
is a safe choice. However, we found that no player builds more than 200 units by watching game replays, so MAX_N_UNITS=200
is a practical choice.
Relative Throughput
Here, we report the relative throughput over the original Python implementation (luxai2022==1.1.3
), on several different GPUs with different MAX_N_UNITS
settings. The original single-thread Python implementation running on an 8255C CPU serves as the baseline. We can observe that the throughput is proportional to GPU memory bandwidth because the game logic is mostly memory-bound, not compute-bound. Byte-access operators take a large portion of the game logic in JUX implementation.
Relative Throughput | ||||||||
---|---|---|---|---|---|---|---|---|
GPU | GPU Mem. Bandwidth | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |
A100-SXM4-40GB | 1555 GB/s | 20k | 1166x | 985x | 748x | 598x | 508x | 437x |
Tesla V100-SXM2-32GB | 900 GB/s | 20k | 783x | 647x | 480x | 375x | 317x | 269x |
Tesla T4 | 320 GB/s | 10k | 263x | 217x | 160x | 125x | 105x | 89x |
GTX 1660 Ti | 288 GB/s | 3k | 218x | 178x | 130x | 103x | 84x | 71x |
Batch Size | Relative Throughput | |
---|---|---|
Intel® Core™ i7-12700 | 1 | 2.12x |
Intel® Xeon® Platinum 8255C CPU @ 2.50GHz | 1 | 1.00x |
Intel® Xeon® Gold 6133 @ 2.50GHz | 1 | 0.89x |
Absolute Throughput
We also report absolute throughput in steps per second here.
Throughput (steps/s) | |||||||
---|---|---|---|---|---|---|---|
GPU | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |
A100-SXM4-40GB | 20k | 363k | 307k | 233k | 186k | 158k | 136k |
Tesla V100-SXM2-32GB | 20k | 244k | 202k | 149k | 117k | 99k | 84k |
Tesla T4 | 10k | 82k | 68k | 50k | 39k | 33k | 28k |
GTX 1660 Ti | 3k | 68k | 56k | 40k | 32k | 26k | 22k |
Batch Size | Throughput (steps/s) | |
---|---|---|
Intel® Core™ i7-12700 | 1 | 0.66k |
Intel® Xeon® Platinum 8255C CPU @ 2.50GHz | 1 | 0.31k |
Intel® Xeon® Gold 6133 @ 2.50GHz | 1 | 0.28k |
Contributing
If you find any bugs or have any suggestions, please feel free to open an issue or a pull request. See CONTRIBUTING.md for setting up a developing environment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file juxai2022-1.0.tar.gz
.
File metadata
- Download URL: juxai2022-1.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03fd266a04d172071d7fa2103d90a26d86319c07cf018bcda48634105ed5cfac |
|
MD5 | fcc372ccb57ad75644ba00f0ea3d3b36 |
|
BLAKE2b-256 | 2d1fad1cf1e16f346b7fb894d91d44f16d87ac5d56df36e2215d1fd043ec1ab9 |
File details
Details for the file juxai2022-1.0-py3-none-any.whl
.
File metadata
- Download URL: juxai2022-1.0-py3-none-any.whl
- Upload date:
- Size: 57.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e8d58251c04fe1560fa668a3506172fc892d8889ed8b373082ec08e732bfa72 |
|
MD5 | e93ba23ade0363f1ef8a15774f281d13 |
|
BLAKE2b-256 | cb9d70e27e6320775d6568ba539e05597a4c51572290063204a65814b9b10bbd |