Skip to main content

A highly configurable implementation of our approach in the Aftab paper, benchmarking different convolutional neural networks and their effects on the final results.

Project description

Overview

Aftab (Persian: آفتاب, meaning "sun" or "sun rays") is a benchmarking framework for evaluating CNN-based encoders in PQN across Atari environments.
It provides standardized training, evaluation, and reproducibility tools for deep reinforcement learning research.

IQM HNS IQM HNS (Last 50M Frames)
Global Performance Last 50M Frames

Global performance of base encoders.

IQM HNS IQM HNS (Last 50M Frames)
Hadamax Global Performance Last 50M Frames

Comparison of two Gamma encoder variants based on findings from Hadamax Encoding: Elevating Performance in Model-Free Atari .

Installation

Install via pip:

pip install aftab

Usage

from aftab import Aftab
from aftab import aftab_environments

seeds = [1, 2, 3, 4]

for environment in aftab_environments:
    agent = Aftab(encoder="gamma", frames="pilot")
    for seed in seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

Defining a Custom Encoder

You can define your own encoder as a PyTorch module and pass it to the agent:

import torch
from aftab import Aftab

class CustomImageEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
  
    def forward(self, x):
        pass

agent = Aftab(encoder=CustomImageEncoder, frames="pilot")

Results

Encoder Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Hadamax Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Final Experiments: (GPUs are working :D)

Model Complexity

Base Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN 78,304 1,686,500 1,764,804 7.734 1.610 9.347
Alpha 174,752 1,782,948 1,957,700 27.541 1.610 29.151
Beta 89,008 1,782,948 1,871,956 61.515 1.610 63.126
Gamma 117,168 1,725,364 1,842,532 22.901 1.610 24.512
Delta 78,552 1,850,588 1,929,140 6.143 1.774 7.917
Epsilon 80,112 2,179,828 2,259,940 13.252 2.101 15.354
Zeta 77,232 2,537,396 2,614,628 25.362 2.462 27.824
Eta 78,400 23,739,460 23,817,860 28.422 23.663 52.085
Theta 76,288 1,127,428 1,203,716 9.065 1.053 10.118

Note: The Eta variant has significantly more parameters than other variants, primarily due to the encoder producing a large number of features.


Hadamax Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN Hadamax 156,608 3,968,516 4,125,124 159.014 3.969 162.984
Gamma Hadamax V1 234,336 1,609,220 1,843,556 122.001 1.610 123.611
Gamma Hadamax V2 234,336 3,280,388 3,514,724 129.300 3.281 132.581

Hyperparameters

Hyperparameter Value
Learning rate $2.5 \times 10^{-4}$
Training environments 128
Test environments 8
Optimizer Rectified Adam
Weight decay 0
$\epsilon$ $1 \times 10^{-5}$
$\beta_{1}$ 0.9
$\beta_{2}$ 0.999
Total Frames 200,000,000
Loss function Mean Squared Error
Scheduler Linear Annealing
$\epsilon$-greedy exploration 10% of total frames
Discount factor ($\gamma$) 0.99
GAE ($\lambda$) 0.65
Epochs 2
Batch size 4096

Used in encoder and Hadamax experiments.

Statistical Significance

PQN Alpha Beta Gamma Delta Epsilon Zeta Eta Theta
PQN - - - - - - - - -
Alpha 0 - - - - - - - -
Beta 0 0.847 - - - - - - -
Gamma 0 0.295 0.802 - - - - - -
Delta 0 0 0 0 - - - - -
Epsilon 0 0.104 0.068 0.01 0 - - - -
Zeta 0 0.145 0.293 0.024 0 0.552 - - -
Eta 0.001 0.337 0.757 0.221 0 0.819 0.967 - -
Theta 0.431 0 0.004 0 0.046 0.001 0.001 0.002 -
Gamma Hadamax Gamma V1 Hadamax Gamma V2 Hadamax
Gamma - - - -
Hadamax Gamma V1 0 - - -
Hadamax Gamma V2 0 0.72 - -
Hadamax Nature DQN 0 0.078 0.151 -

Reproducibility

Due to the stochastic nature of deep reinforcement learning, exact reproducibility via fixed datasets is not feasible.
Instead, we provide a set of random seeds used in our experiments.

from aftab import aftab_seeds

print(aftab_seeds)

Full experiment replication:

from aftab import Aftab
from aftab import aftab_environments
from aftab import aftab_seeds

for environment in aftab_environments:
    agent = Aftab()
    for seed in aftab_seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

A comprehensive set of Atari environments is available via EnvPool:
https://envpool.readthedocs.io/en/latest/env/atari.html#available-tasks

Hardware

Nvidia A40 GPUs were used to run all the experiments in this experiment.

Specification Details
GPU Memory 48 GB GDDR6 with error-correcting code (ECC)
GPU Memory Bandwidth 696 GB/s
Interconnect NVIDIA NVLink 112.5 GB/s (bidirectional); PCIe Gen4: 64 GB/s
NVLink 2-way low profile (2-slot)
Display Ports 3x DisplayPort 1.4*
Max Power Consumption 300 W
Form Factor 4.4" (H) x 10.5" (L), Dual Slot
Thermal Passive
vGPU Software Support NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server, NVIDIA AI Enterprise
vGPU Profiles Supported See the Virtual GPU Licensing Guide
NVENC / NVDEC 1x / 2x (includes AV1 decode)
Secure Boot Secure and Measured Boot with Hardware Root of Trust (optional)
NEBS Ready Level 3
Power Connector 8-pin CPU

Citation

@article{aftab2026benchmarking,
  title={Aftab: Benchmarking {CNN} Encoders in {PQN}},
  author={Shieenavaz, Taha and Zareshahraki, Shabnam and Nanni, Loris},
  journal={arXiv preprint arXiv:YYMM.NNNNN},
  year={2026}
}

Related Works

@misc{2407.04811,
  Title = {Simplifying Deep Temporal Difference Learning},
  Author = {Matteo Gallici and Mattie Fellows and Benjamin Ellis and Bartomeu Pou and Ivan Masmitja and Jakob Nicolaus Foerster and Mario Martin},
  Year = {2024},
  Eprint = {arXiv:2407.04811},
}
@misc{2403.03950,
  Title = {Stop Regressing: Training Value Functions via Classification for Scalable Deep RL},
  Author = {Jesse Farebrother and Jordi Orbay and Quan Vuong and Adrien Ali Taïga and Yevgen Chebotar and Ted Xiao and Alex Irpan and Sergey Levine and Pablo Samuel Castro and Aleksandra Faust and Aviral Kumar and Rishabh Agarwal},
  Year = {2024},
  Eprint = {arXiv:2403.03950},
}
@misc{1511.06581,
  Title = {Dueling Network Architectures for Deep Reinforcement Learning},
  Author = {Ziyu Wang and Tom Schaul and Matteo Hessel and Hado van Hasselt and Marc Lanctot and Nando de Freitas},
  Year = {2015},
  Eprint = {arXiv:1511.06581},
}
@misc{1806.04613,
  Title = {Improving Regression Performance with Distributional Losses},
  Author = {Ehsan Imani and Martha White},
  Year = {2018},
  Eprint = {arXiv:1806.04613},
}
@misc{1602.04621,
  Title = {Deep Exploration via Bootstrapped DQN},
  Author = {Ian Osband and Charles Blundell and Alexander Pritzel and Benjamin Van Roy},
  Year = {2016},
  Eprint = {arXiv:1602.04621},
}

License

© 2025 Taha Shieenavaz.
Licensed under CC BY-NC 4.0: https://creativecommons.org/licenses/by-nc/4.0/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aftab-0.1.49.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aftab-0.1.49-py3-none-any.whl (65.2 kB view details)

Uploaded Python 3

File details

Details for the file aftab-0.1.49.tar.gz.

File metadata

  • Download URL: aftab-0.1.49.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.49.tar.gz
Algorithm Hash digest
SHA256 a61fa34b78c17974ae73756ea861c18e1599a10c6ee08cc14e2c14fb2d9e8145
MD5 930d2095eeee7e13257ba3d0dd61e994
BLAKE2b-256 f9101f813ac3b250821456a709a94113f8eb794f81707dd554afad4a519cfebd

See more details on using hashes here.

File details

Details for the file aftab-0.1.49-py3-none-any.whl.

File metadata

  • Download URL: aftab-0.1.49-py3-none-any.whl
  • Upload date:
  • Size: 65.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.49-py3-none-any.whl
Algorithm Hash digest
SHA256 e13688ce9d955dfdab29a81f49a58b4a47532063922af974d68b7bcf69ee7782
MD5 c319a06a451c266627c6337b96d4e802
BLAKE2b-256 929470dfe471f2414fd1ec908573bd8d65e3d949ec2b64dfd501cc1b2edf3b17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page