Skip to main content

A highly configurable implementation of our approach in the Aftab paper, benchmarking different convolutional neural networks and their effects on the final results.

Project description

Overview

Aftab (Persian: آفتاب, meaning "sun" or "sun rays") is a benchmarking framework for evaluating CNN-based encoders in PQN across Atari environments.
It provides standardized training, evaluation, and reproducibility tools for deep reinforcement learning research.

IQM HNS IQM HNS (Last 50M Frames)
Global Performance Last 50M Frames

Global performance of base encoders.

IQM HNS IQM HNS (Last 50M Frames)
Hadamax Global Performance Last 50M Frames

Comparison of two Gamma encoder variants based on findings from Hadamax Encoding: Elevating Performance in Model-Free Atari .

Installation

Install via pip:

pip install aftab

Usage

Note that the JAX API is under development, but using current PyTorch version you need to expect training of your agents to take up to 13 hours for the best configuration. We hope we are going to get much faster results using JAX.

from aftab import Aftab
from aftab import aftab_environments

seeds = [1, 2, 3, 4]

for environment in aftab_environments:
    agent = Aftab(encoder="gamma", frames="pilot")
    for seed in seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

Defining a Custom Encoder

You can define your own encoder as a PyTorch module and pass it to the agent:

import torch
from aftab import Aftab

class CustomImageEncoder(torch.nn.Module):
    def __init__(self):
        super().__init__()
  
    def forward(self, x):
        pass

agent = Aftab(encoder=CustomImageEncoder, frames="pilot")

Results

Encoder Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Hadamax Experiments:

  • Tables:
  • Charts:
    • Loss Evolution
    • IQM HNS

Final Experiments: (GPUs are working :D)

Model Complexity

Base Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN 78,304 1,686,500 1,764,804 7.734 1.610 9.347
Alpha 174,752 1,782,948 1,957,700 27.541 1.610 29.151
Beta 89,008 1,782,948 1,871,956 61.515 1.610 63.126
Gamma 117,168 1,725,364 1,842,532 22.901 1.610 24.512
Delta 78,552 1,850,588 1,929,140 6.143 1.774 7.917
Epsilon 80,112 2,179,828 2,259,940 13.252 2.101 15.354
Zeta 77,232 2,537,396 2,614,628 25.362 2.462 27.824
Eta 78,400 23,739,460 23,817,860 28.422 23.663 52.085
Theta 76,288 1,127,428 1,203,716 9.065 1.053 10.118

Note: The Eta variant has significantly more parameters than other variants, primarily due to the encoder producing a large number of features.


Hadamax Variants

Variant Encoder Parameters Regression Head Parameters Total Parameters Encoder FLOPs Regression Head FLOPs Total FLOPs
PQN Hadamax 156,608 3,968,516 4,125,124 159.014 3.969 162.984
Gamma Hadamax V1 234,336 1,609,220 1,843,556 122.001 1.610 123.611
Gamma Hadamax V2 234,336 3,280,388 3,514,724 129.300 3.281 132.581

Hyperparameters

Hyperparameter Value
Learning rate $2.5 \times 10^{-4}$
Training environments 128
Test environments 8
Optimizer Rectified Adam
Weight decay 0
$\epsilon$ $1 \times 10^{-5}$
$\beta_{1}$ 0.9
$\beta_{2}$ 0.999
Total Frames 200,000,000
Loss function Mean Squared Error
Scheduler Linear Annealing
$\epsilon$-greedy exploration 10% of total frames
Discount factor ($\gamma$) 0.99
GAE ($\lambda$) 0.65
Epochs 2
Batch size 4096

Used in encoder and Hadamax experiments.

Statistical Significance

PQN Alpha Beta Gamma Delta Epsilon Zeta Eta Theta
PQN - - - - - - - - -
Alpha 0 - - - - - - - -
Beta 0 0.847 - - - - - - -
Gamma 0 0.295 0.802 - - - - - -
Delta 0 0 0 0 - - - - -
Epsilon 0 0.104 0.068 0.01 0 - - - -
Zeta 0 0.145 0.293 0.024 0 0.552 - - -
Eta 0.001 0.337 0.757 0.221 0 0.819 0.967 - -
Theta 0.431 0 0.004 0 0.046 0.001 0.001 0.002 -
Gamma Hadamax Gamma V1 Hadamax Gamma V2 Hadamax
Gamma - - - -
Hadamax Gamma V1 0 - - -
Hadamax Gamma V2 0 0.72 - -
Hadamax Nature DQN 0 0.078 0.151 -

Reproducibility

Due to the stochastic nature of deep reinforcement learning, exact reproducibility via fixed datasets is not feasible.
Instead, we provide a set of random seeds used in our experiments.

from aftab import aftab_seeds

print(aftab_seeds)

Full experiment replication:

from aftab import Aftab
from aftab import aftab_environments
from aftab import aftab_seeds

for environment in aftab_environments:
    agent = Aftab()
    for seed in aftab_seeds:
        agent.train(environment=environment, seed=seed)
        agent.log()

A comprehensive set of Atari environments is available via EnvPool:
https://envpool.readthedocs.io/en/latest/env/atari.html#available-tasks

Hardware

Nvidia A40 GPUs were used to run all the experiments in this experiment.

Specification Details
GPU Memory 48 GB GDDR6 with error-correcting code (ECC)
GPU Memory Bandwidth 696 GB/s
Interconnect NVIDIA NVLink 112.5 GB/s (bidirectional); PCIe Gen4: 64 GB/s
NVLink 2-way low profile (2-slot)
Display Ports 3x DisplayPort 1.4*
Max Power Consumption 300 W
Form Factor 4.4" (H) x 10.5" (L), Dual Slot
Thermal Passive
vGPU Software Support NVIDIA Virtual PC, NVIDIA Virtual Applications, NVIDIA RTX Virtual Workstation, NVIDIA Virtual Compute Server, NVIDIA AI Enterprise
vGPU Profiles Supported See the Virtual GPU Licensing Guide
NVENC / NVDEC 1x / 2x (includes AV1 decode)
Secure Boot Secure and Measured Boot with Hardware Root of Trust (optional)
NEBS Ready Level 3
Power Connector 8-pin CPU

Citation

@article{aftab2026benchmarking,
  title={Aftab: Benchmarking {CNN} Encoders in {PQN}},
  author={Shieenavaz, Taha and Zareshahraki, Shabnam and Nanni, Loris},
  journal={arXiv preprint arXiv:YYMM.NNNNN},
  year={2026}
}

Related Works

@misc{2407.04811,
  Title = {Simplifying Deep Temporal Difference Learning},
  Author = {Matteo Gallici and Mattie Fellows and Benjamin Ellis and Bartomeu Pou and Ivan Masmitja and Jakob Nicolaus Foerster and Mario Martin},
  Year = {2024},
  Eprint = {arXiv:2407.04811},
}
@misc{2403.03950,
  Title = {Stop Regressing: Training Value Functions via Classification for Scalable Deep RL},
  Author = {Jesse Farebrother and Jordi Orbay and Quan Vuong and Adrien Ali Taïga and Yevgen Chebotar and Ted Xiao and Alex Irpan and Sergey Levine and Pablo Samuel Castro and Aleksandra Faust and Aviral Kumar and Rishabh Agarwal},
  Year = {2024},
  Eprint = {arXiv:2403.03950},
}
@misc{1511.06581,
  Title = {Dueling Network Architectures for Deep Reinforcement Learning},
  Author = {Ziyu Wang and Tom Schaul and Matteo Hessel and Hado van Hasselt and Marc Lanctot and Nando de Freitas},
  Year = {2015},
  Eprint = {arXiv:1511.06581},
}
@misc{1806.04613,
  Title = {Improving Regression Performance with Distributional Losses},
  Author = {Ehsan Imani and Martha White},
  Year = {2018},
  Eprint = {arXiv:1806.04613},
}
@misc{1602.04621,
  Title = {Deep Exploration via Bootstrapped DQN},
  Author = {Ian Osband and Charles Blundell and Alexander Pritzel and Benjamin Van Roy},
  Year = {2016},
  Eprint = {arXiv:1602.04621},
}

License

© 2025 Taha Shieenavaz.
Licensed under CC BY-NC 4.0: https://creativecommons.org/licenses/by-nc/4.0/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aftab-0.1.55.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aftab-0.1.55-py3-none-any.whl (75.4 kB view details)

Uploaded Python 3

File details

Details for the file aftab-0.1.55.tar.gz.

File metadata

  • Download URL: aftab-0.1.55.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.55.tar.gz
Algorithm Hash digest
SHA256 9e7130d3869123639a2a6fde4f3d5efb54e123e78cf73bd5332d6c8c2ee599fb
MD5 4436d6f89460cc9fb3a700d8f284fc91
BLAKE2b-256 274ba5f3f40a3ffaafdf271d5e1fd8297bb0b682bf1d92dea3d2fda5f775f9ca

See more details on using hashes here.

File details

Details for the file aftab-0.1.55-py3-none-any.whl.

File metadata

  • Download URL: aftab-0.1.55-py3-none-any.whl
  • Upload date:
  • Size: 75.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for aftab-0.1.55-py3-none-any.whl
Algorithm Hash digest
SHA256 1521976eb56020fd7f6c00a7ab855fe3aa99d3124e2a192e29a8023bc679acc3
MD5 2ac2def2138e6077af8867146d233624
BLAKE2b-256 f6d0da4ae90c33624f3ecd67b25bb9a9b8119dc7d6f139f67df8c355c597b76d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page