Skip to main content

A highly configurable implementation of our approach in the Aftab paper, benchmarking different convolutional neural networks and their effects on the final results.

Project description

Overview

Aftab (آفتاب) is a benchmarking framework for evaluating CNN-based encoders in PQN across Atari environments.
It provides standardized training, evaluation, and reproducibility tools for deep reinforcement learning research.

IQM HNS IQM HNS (Last 50M Frames)
Global Performance Last 50M Frames

Global performance of base encoders.

IQM HNS IQM HNS (Last 50M Frames)
Hadamax Global Performance Last 50M Frames

Comparison of two Gamma encoder variants based on findings from Hadamax Encoding: Elevating Performance in Model-Free Atari .

Installation

Install via pip:

pip install aftab

Usage

from aftab import Aftab
from aftab import aftab_environments

seeds = [1, 2, 3, 4]

for environment in aftab_environments:
    agent = Aftab(encoder="gamma", frames="pilot")
    for seed in seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

Defining a Custom Encoder

You can define your own encoder as a PyTorch module and pass it to the agent:

import torch
from aftab import Aftab

class CustomImageEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
  
    def forward(self, x):
        pass

agent = Aftab(encoder=CustomImageEncoder, frames="pilot")

Results

Base Encoder Experiments

Hadamax Experiments

Note: The Eta variant has significantly more parameters than other variants, primarily due to the encoder producing a large number of features.

Parameter Count

Base Encoder Variations

Variant Encoder Parameters Q Regression Head Total Parameters
PQN 78,304 1,686,500 1,764,804
Alpha 174,752 1,782,948 1,957,700
Beta 89,008 1,782,948 1,871,956
Gamma 117,168 1,725,364 1,842,532
Delta 78,552 1,850,588 1,929,140
Epsilon 80,112 2,179,828 2,259,940
Zeta 77,232 2,537,396 2,614,628
Eta 78,400 23,739,460 23,817,860
Theta 76,288 1,127,428 1,203,716

Hadamax Variants

Variant Encoder Parameters Q Regression Head Total Parameters
PQN Hadamax 156,608 3,968,516 4,125,124
Gamma Hadamax V1 234,336 1,609,220 1,843,556
Gamma Hadamax V2 234,336 3,280,388 3,514,724

Hyperparameters

Hyperparameter Value
Learning rate $2.5 \times 10^{-4}$
Training environments 128
Test environments 8
Optimizer Rectified Adam
Weight decay 0
$\epsilon$ $1 \times 10^{-5}$
$\beta_{1}$ 0.9
$\beta_{2}$ 0.999
Total Frames 200,000,000
Loss function Mean Squared Error
Scheduler Linear Annealing
$\epsilon$-greedy exploration 10% of total frames
Discount factor ($\gamma$) 0.99
GAE ($\lambda$) 0.65
Epochs 2
Batch size 4096

Used in encoder and Hadamax experiments.

Statistical Significance

PQN Alpha Beta Gamma Delta Epsilon Zeta Eta Theta
PQN - - - - - - - - -
Alpha 0 - - - - - - - -
Beta 0 0.847 - - - - - - -
Gamma 0 0.295 0.802 - - - - - -
Delta 0 0 0 0 - - - - -
Epsilon 0 0.104 0.068 0.01 0 - - - -
Zeta 0 0.145 0.293 0.024 0 0.552 - - -
Eta 0.001 0.337 0.757 0.221 0 0.819 0.967 - -
Theta 0.431 0 0.004 0 0.046 0.001 0.001 0.002 -
Gamma Hadamax Gamma V1 Hadamax Gamma V2 Hadamax
Gamma - - - -
Hadamax Gamma V1 0 - - -
Hadamax Gamma V2 0 0.72 - -
Hadamax Nature DQN 0 0.078 0.151 -

Reproducibility

Due to the stochastic nature of deep reinforcement learning, exact reproducibility via fixed datasets is not feasible.
Instead, we provide a set of random seeds used in our experiments.

from aftab import aftab_seeds

print(aftab_seeds)

Full experiment replication:

from aftab import Aftab
from aftab import aftab_environments
from aftab import aftab_seeds

for environment in aftab_environments:
    agent = Aftab()
    for seed in aftab_seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

A comprehensive set of Atari environments is available via EnvPool:
https://envpool.readthedocs.io/en/latest/env/atari.html#available-tasks

Citation

@article{aftab2026benchmarking,
  title={Aftab: Benchmarking {CNN} Encoders in {PQN}},
  author={Shieenavaz, Taha and Zareshahraki, Shabnam and Nanni, Loris},
  journal={arXiv preprint arXiv:YYMM.NNNNN},
  year={2026}
}

Related Works

@misc{farebrother2024stop,
	title = {Stop {Regressing}: {Training} {Value} {Functions} via {Classification} for {Scalable} {Deep} {RL}},
	year = {2024},
	url = {http://arxiv.org/abs/2403.03950},
	doi = {10.48550/arXiv.2403.03950},
	author = {Farebrother, Jesse and Orbay, Jordi and Vuong, Quan and Taïga, Adrien Ali and Chebotar, Yevgen and Xiao, Ted and Irpan, Alex and Levine, Sergey and Castro, Pablo Samuel and Faust, Aleksandra and Kumar, Aviral and Agarwal, Rishabh},
	publisher = {arXiv},
}

License

© 2025 Taha Shieenavaz.
Licensed under CC BY-NC 4.0: https://creativecommons.org/licenses/by-nc/4.0/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aftab-0.1.23.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aftab-0.1.23-py3-none-any.whl (56.6 kB view details)

Uploaded Python 3

File details

Details for the file aftab-0.1.23.tar.gz.

File metadata

  • Download URL: aftab-0.1.23.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.23.tar.gz
Algorithm Hash digest
SHA256 eb7a2a214dc49494bf04374f247f1cf8b14861d303de95bc413c155d5da14d65
MD5 f1cb2f88387089fcbcc79d6a2b5bf4e3
BLAKE2b-256 3318fcdf518bef4ff00ba516348693175cccda2c5fffa6b6f6d096f342ddd128

See more details on using hashes here.

File details

Details for the file aftab-0.1.23-py3-none-any.whl.

File metadata

  • Download URL: aftab-0.1.23-py3-none-any.whl
  • Upload date:
  • Size: 56.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.23-py3-none-any.whl
Algorithm Hash digest
SHA256 35cb6bb56b1bf265398329d67a4f159ba3e807cf6fad9529f0a1902de7df4edb
MD5 c038b1247955620ea4a93bf948d4d5ee
BLAKE2b-256 09ff5123a7a618907cd5c3708f47b85948221468e91e2f9f81a77f58e2b2563a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page