Skip to main content

Fast reinforcement learning 💨

Project description

flashrl

flashrl does RL with millions of steps/second 💨 while being tiny: ~200 lines of code

🛠️ pip install flashrl or clone the repo & pip install -r requirements.txt

  • If cloned (or if envs changed), compile: python setup.py build_ext --inplace

💡 flashrl will always be tiny: Read the code (+paste into LLM) to understand it!

Quick Start 🚀

flashrl uses a Learner that holds an env and a model (default: Policy with LSTM)

import flashrl as frl

learn = frl.Learner(frl.envs.Pong(n_agents=2**14))
curves = learn.fit(40, steps=16, desc='done')
frl.print_curve(curves['loss'], label='loss')
frl.play(learn.env, learn.model, fps=8)
learn.env.close()

.fit does RL with ~10 million steps: 40 iterations × 16 steps × 2**14 agents!

Run it yourself via python train.py and play against the AI 🪄

Click here, to read a tiny doc 📑

Learner takes the arguments

  • env: RL environment
  • model: A Policy model
  • device: Per default picks mps or cuda if available else cpu
  • dtype: Per default torch.bfloat16 if device is cuda else torch.float32
  • compile_no_lstm: Speedup via torch.compile if model has no lstm
  • **kwargs: Passed to the Policy, e.g. hidden_size or lstm

Learner.fit takes the arguments

  • iters: Number of iterations
  • steps: Number of steps in rollout
  • desc: Progress bar description (e.g. 'reward')
  • log: If True, tensorboard logging is enabled
    • run tensorboard --logdir=runsand visit http://localhost:6006 in the browser!
  • stop_func: Function that stops training if it returns True e.g.
...
def stop(kl, **kwargs):
  return kl > .1

curves = learn.fit(40, steps=16, stop_func=stop)
...
  • lr, anneal_lr & args of ppo after bs: Hyperparameters

The most important functions in flashrl/utils.py are

  • print_curve: Visualizes the loss across the iters
  • play: Plays the environment in the terminal and takes
    • model: A Policy model
    • playable: If True, allows you to act (or decide to let the model act)
    • steps: Number of steps
    • fps: Frames per second
    • obs: Argument of the env that should be rendered as observations
    • dump: If True, no frame refresh -> Frames accumulate in the terminal
    • idx: Agent index between 0 and n_agents (default: 0)

Environments 🕹️

Each env is one Cython(=.pyx) file in flashrl/envs. That's it!

To add custom envs, use grid.pyx, pong.pyx or multigrid.pyx as a template:

  • grid.pyx for single-agent envs (~110 LOC)
  • pong.pyx for 1 vs 1 agent envs (~150 LOC)
  • multigrid.pyx for multi-agent envs (~190 LOC)
Grid Pong MultiGrid
Agent must reach goal Agent must score Agent must reach goal first
grid pong multigrid

Acknowledgements 🙌

I want to thank

and last but not least...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

flashrl-0.2.1-cp313-cp313-win_amd64.whl (694.6 kB view details)

Uploaded CPython 3.13Windows x86-64

flashrl-0.2.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

flashrl-0.2.1-cp313-cp313-macosx_10_13_universal2.whl (961.5 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

flashrl-0.2.1-cp312-cp312-win_amd64.whl (694.8 kB view details)

Uploaded CPython 3.12Windows x86-64

flashrl-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

flashrl-0.2.1-cp312-cp312-macosx_10_13_universal2.whl (965.7 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

flashrl-0.2.1-cp311-cp311-win_amd64.whl (689.1 kB view details)

Uploaded CPython 3.11Windows x86-64

flashrl-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

flashrl-0.2.1-cp311-cp311-macosx_10_9_universal2.whl (960.7 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

flashrl-0.2.1-cp310-cp310-win_amd64.whl (689.0 kB view details)

Uploaded CPython 3.10Windows x86-64

flashrl-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

flashrl-0.2.1-cp310-cp310-macosx_10_9_universal2.whl (960.0 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file flashrl-0.2.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: flashrl-0.2.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 694.6 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for flashrl-0.2.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 56ff9314164007fbf92d88bc41e595b3a92239ac84d005e44a0b477d1609016b
MD5 5afe3bc7a0556c46885b6140c9b43fd5
BLAKE2b-256 1831a94be16246bd695bc1a55de8bddeb44268edf5c81ad6623f659e69eaa87b

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f9af705b7b1662a3d5ab3b8e1b69d8ae16747020fc753b90f5b5ec1a0e87924a
MD5 cbdaca4d9891aba7ad12e76d16f46759
BLAKE2b-256 cc5aa28e3af1e200199d857056696296088c04eceac536a8b23a8cd39ed80861

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 d24bc5ee9a7409a96479e1acb23cfae78855342e9be6dbbfbfb066a93a705a3e
MD5 9d7617fc425be52f809556b0d81d5b39
BLAKE2b-256 f3243e84a6dee6e6bf888b2768fdb56a738e65f856d3c877dc3510b74cc26229

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: flashrl-0.2.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 694.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for flashrl-0.2.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d9030ffa184d85e8211d61bb00c07904c4d95a50572e8f32f05e6fe72b900f98
MD5 40c354ed157625dac55a8cb90813ce68
BLAKE2b-256 21b5491ab3bab7f72680b2c30a4314919bf896c4f2e8ed720432c5ed4d5ffded

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 04bd1cbc528ce8e7d12da12fdd3a7871835a704c60e0294aa81ed98b8a51a6a4
MD5 fdb1e878eaf487fb09df6adcab1bb81c
BLAKE2b-256 9cd5d0b38ef177e560dc8686d41a1b790b7a9bdb24fbf2e25d9439b671719186

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 39b1d39387b4974d72796f111771f2e2aff68d5da432a936b25b53384eda671d
MD5 69bc4fdc006259bf5f22b4f1c7c7a534
BLAKE2b-256 131a032237c0b26ab9391438dfbe69a7d6ab05bdd9cc839a36b91fd234d30834

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: flashrl-0.2.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 689.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for flashrl-0.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2df43113f20e8cb61bf691d8abef464d59f8aa81be65f52516ffb306a8ea005c
MD5 988a946eadb2da97936c3c1441b3de76
BLAKE2b-256 507b82a23af23407008cbde43a645e723abc230d2a866136414abfc654fedb4e

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 88ea838f4c41eb16eda2f153dc16e6d6b3d0ea596c27432fdf971f51f390e9aa
MD5 228face91730716509357f6361d45ed0
BLAKE2b-256 47d066845654e40e38d995f5195f2b9a1296df8899e0fa4b87dbb8da40bb11e8

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d6ca678c31ce7d1b443117c8ba78d2249fc1bfa00155d76a6a6cdd00a8ce7555
MD5 48b24b3f358d377fe7ed15c5eb17e46e
BLAKE2b-256 38a39d24be919239b82b7c02a7fc35b2b0e3c594dbceabda3241465bfb00eeed

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: flashrl-0.2.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 689.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for flashrl-0.2.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2efbb5759d76b8ab116061c19633ef0a2b371634c27c839c954e147d100d85ca
MD5 8341f602336d398113155d2211336a33
BLAKE2b-256 d139d1cf206ab47aa3974143a0031e8281af7f254ae50400bb6d828253be5bc8

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ebe75e4cd4838ca3c248485837703a3b19223ad8fa6654bc00a9b0fef60dbacf
MD5 e24eb1ef466e6d92653cb523e700f6d3
BLAKE2b-256 aea22fce780b2bdae5ebcb9e47ef1d3a5788a1a0b7ec3ccfe5c1de0d6b934fdd

See more details on using hashes here.

File details

Details for the file flashrl-0.2.1-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for flashrl-0.2.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 2c41e7d2df4df5845d3a14e95748a7809959f8e329bb6c93166b5c8f14b8ed1b
MD5 fda9f0f049b9c74e86c4d1e4f19f7cf8
BLAKE2b-256 f6e23459be0242e75d4d687e48037024c2d3294fb84a13401d3fd609bf6b10e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page