Skip to main content

A highly configurable implementation of our approach in the Aftab paper, benchmarking different convolutional neural networks and their effects on the final results.

Project description

Overview

Aftab (Persian: آفتاب, meaning "sun" or "sun rays") is a benchmarking framework for evaluating CNN-based encoders in PQN across Atari environments.
It provides standardized training, evaluation, and reproducibility tools for deep reinforcement learning research.

IQM HNS IQM HNS (Last 50M Frames)
Global Performance Last 50M Frames

Global performance of base encoders.

IQM HNS IQM HNS (Last 50M Frames)
Hadamax Global Performance Last 50M Frames

Comparison of two Gamma encoder variants based on findings from Hadamax Encoding: Elevating Performance in Model-Free Atari .

Installation

Install via pip:

pip install aftab

Usage

from aftab import Aftab
from aftab import aftab_environments

seeds = [1, 2, 3, 4]

for environment in aftab_environments:
    agent = Aftab(encoder="gamma", frames="pilot")
    for seed in seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

Defining a Custom Encoder

You can define your own encoder as a PyTorch module and pass it to the agent:

import torch
from aftab import Aftab

class CustomImageEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
  
    def forward(self, x):
        pass

agent = Aftab(encoder=CustomImageEncoder, frames="pilot")

Results

Encoder Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Hadamax Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Final Experiments: (GPUs are working :D)

Model Complexity

Base Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN 78,304 1,686,500 1,764,804 7.734 1.610 9.347
Alpha 174,752 1,782,948 1,957,700 27.541 1.610 29.151
Beta 89,008 1,782,948 1,871,956 61.515 1.610 63.126
Gamma 117,168 1,725,364 1,842,532 22.901 1.610 24.512
Delta 78,552 1,850,588 1,929,140 6.143 1.774 7.917
Epsilon 80,112 2,179,828 2,259,940 13.252 2.101 15.354
Zeta 77,232 2,537,396 2,614,628 25.362 2.462 27.824
Eta 78,400 23,739,460 23,817,860 28.422 23.663 52.085
Theta 76,288 1,127,428 1,203,716 9.065 1.053 10.118

Note: The Eta variant has significantly more parameters than other variants, primarily due to the encoder producing a large number of features.


Hadamax Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN Hadamax 156,608 3,968,516 4,125,124 159.014 3.969 162.984
Gamma Hadamax V1 234,336 1,609,220 1,843,556 122.001 1.610 123.611
Gamma Hadamax V2 234,336 3,280,388 3,514,724 129.300 3.281 132.581

Hyperparameters

Hyperparameter Value
Learning rate $2.5 \times 10^{-4}$
Training environments 128
Test environments 8
Optimizer Rectified Adam
Weight decay 0
$\epsilon$ $1 \times 10^{-5}$
$\beta_{1}$ 0.9
$\beta_{2}$ 0.999
Total Frames 200,000,000
Loss function Mean Squared Error
Scheduler Linear Annealing
$\epsilon$-greedy exploration 10% of total frames
Discount factor ($\gamma$) 0.99
GAE ($\lambda$) 0.65
Epochs 2
Batch size 4096

Used in encoder and Hadamax experiments.

Statistical Significance

PQN Alpha Beta Gamma Delta Epsilon Zeta Eta Theta
PQN - - - - - - - - -
Alpha 0 - - - - - - - -
Beta 0 0.847 - - - - - - -
Gamma 0 0.295 0.802 - - - - - -
Delta 0 0 0 0 - - - - -
Epsilon 0 0.104 0.068 0.01 0 - - - -
Zeta 0 0.145 0.293 0.024 0 0.552 - - -
Eta 0.001 0.337 0.757 0.221 0 0.819 0.967 - -
Theta 0.431 0 0.004 0 0.046 0.001 0.001 0.002 -
Gamma Hadamax Gamma V1 Hadamax Gamma V2 Hadamax
Gamma - - - -
Hadamax Gamma V1 0 - - -
Hadamax Gamma V2 0 0.72 - -
Hadamax Nature DQN 0 0.078 0.151 -

Reproducibility

Due to the stochastic nature of deep reinforcement learning, exact reproducibility via fixed datasets is not feasible.
Instead, we provide a set of random seeds used in our experiments.

from aftab import aftab_seeds

print(aftab_seeds)

Full experiment replication:

from aftab import Aftab
from aftab import aftab_environments
from aftab import aftab_seeds

for environment in aftab_environments:
    agent = Aftab()
    for seed in aftab_seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

A comprehensive set of Atari environments is available via EnvPool:
https://envpool.readthedocs.io/en/latest/env/atari.html#available-tasks

Hardware

Nvidia A40 GPUs were used to run all the experiments in this experiment.

Specification Details
GPU Memory 48 GB GDDR6 with error-correcting code (ECC)
GPU Memory Bandwidth 696 GB/s
Interconnect NVIDIA NVLink 112.5 GB/s (bidirectional); PCIe Gen4: 64 GB/s
NVLink 2-way low profile (2-slot)
Display Ports 3x DisplayPort 1.4*
Max Power Consumption 300 W
Form Factor 4.4" (H) x 10.5" (L), Dual Slot
Thermal Passive
vGPU Software Support NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server, NVIDIA AI Enterprise
vGPU Profiles Supported See the Virtual GPU Licensing Guide
NVENC / NVDEC 1x / 2x (includes AV1 decode)
Secure Boot Secure and Measured Boot with Hardware Root of Trust (optional)
NEBS Ready Level 3
Power Connector 8-pin CPU

Citation

@article{aftab2026benchmarking,
  title={Aftab: Benchmarking {CNN} Encoders in {PQN}},
  author={Shieenavaz, Taha and Zareshahraki, Shabnam and Nanni, Loris},
  journal={arXiv preprint arXiv:YYMM.NNNNN},
  year={2026}
}

Related Works

@misc{2407.04811,
  Title = {Simplifying Deep Temporal Difference Learning},
  Author = {Matteo Gallici and Mattie Fellows and Benjamin Ellis and Bartomeu Pou and Ivan Masmitja and Jakob Nicolaus Foerster and Mario Martin},
  Year = {2024},
  Eprint = {arXiv:2407.04811},
}
@misc{2403.03950,
  Title = {Stop Regressing: Training Value Functions via Classification for Scalable Deep RL},
  Author = {Jesse Farebrother and Jordi Orbay and Quan Vuong and Adrien Ali Taïga and Yevgen Chebotar and Ted Xiao and Alex Irpan and Sergey Levine and Pablo Samuel Castro and Aleksandra Faust and Aviral Kumar and Rishabh Agarwal},
  Year = {2024},
  Eprint = {arXiv:2403.03950},
}
@misc{1511.06581,
  Title = {Dueling Network Architectures for Deep Reinforcement Learning},
  Author = {Ziyu Wang and Tom Schaul and Matteo Hessel and Hado van Hasselt and Marc Lanctot and Nando de Freitas},
  Year = {2015},
  Eprint = {arXiv:1511.06581},
}
@misc{1806.04613,
  Title = {Improving Regression Performance with Distributional Losses},
  Author = {Ehsan Imani and Martha White},
  Year = {2018},
  Eprint = {arXiv:1806.04613},
}
@misc{1602.04621,
  Title = {Deep Exploration via Bootstrapped DQN},
  Author = {Ian Osband and Charles Blundell and Alexander Pritzel and Benjamin Van Roy},
  Year = {2016},
  Eprint = {arXiv:1602.04621},
}

License

© 2025 Taha Shieenavaz.
Licensed under CC BY-NC 4.0: https://creativecommons.org/licenses/by-nc/4.0/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aftab-0.1.46.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aftab-0.1.46-py3-none-any.whl (65.2 kB view details)

Uploaded Python 3

File details

Details for the file aftab-0.1.46.tar.gz.

File metadata

  • Download URL: aftab-0.1.46.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.46.tar.gz
Algorithm Hash digest
SHA256 30f8c16a919b871c66ccec8d4d4f57fb363891d9b51424ab9bf853fc4fd1edca
MD5 3e14df454771b5c4afc7503f73e143fd
BLAKE2b-256 58f3a1413175ccda7c2bb0600526a36b092fcaa8483ea368ced0641366fe14e3

See more details on using hashes here.

File details

Details for the file aftab-0.1.46-py3-none-any.whl.

File metadata

  • Download URL: aftab-0.1.46-py3-none-any.whl
  • Upload date:
  • Size: 65.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.46-py3-none-any.whl
Algorithm Hash digest
SHA256 ca9ff9a94d5394b0c1a8a9f79ad7d4551a884a153f84f1d41378b7e0b69f9db3
MD5 4dd826cf4d739e0560806ac26f8bf997
BLAKE2b-256 5e392c40dc1cc580fa0635d821d55a425013dd47103bc36f25e97a2cd0c971e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page