Skip to main content

A smart way to create a ddql agent

Project description

BaseAgent Class

It is an abstract class that implements an agent for reinforcement learning. This class can be extended to create specific agents for different environments.

Constructor

The constructor accepts the following parameters:

  • num_actions: The number of possible actions that the agent can take.
  • environment: The environment in which the agent operates, represented as a numpy array.
  • fit_each_n_steps: The number of steps after which the agent trains its models.
  • exploration_rate: The probability that the agent chooses a random action instead of the optimal action.
  • exploration_rate_decay: The decay rate of the exploration rate.
  • gamma: The discount factor used in the Q-value update equation.
  • cumulative_rewards_max_length: The maximum length of the array that keeps track of cumulative rewards.
  • memory_max_length: The maximum length of the agent's memory.
  • memory_batch_size: The batch size used for training the models.
  • allow_episode_tracking: A boolean flag indicating whether the agent should track episodes.

Methods

The BaseAgent class implements the following methods:

  • start_episode: Starts a new episode.
  • stop_episode: Ends the current episode.
  • get_episodes: Returns all recorded episodes.
  • reset_episodes: Resets all recorded episodes.
  • is_memory_ready: Checks whether the agent's memory is ready for training.
  • step: Performs a step of the agent, choosing an action, receiving a reward, and updating the models.
  • get_last_cumulative_rewards: Returns the sum of the last cumulative rewards.

In addition, the it takes the following functions to be implemented in subclasses:

  • reset_state: Resets the agent's state at the start of a new episode.
  • _get_reward: Calculates the reward received by the agent for undertaking an action in a given state.
  • _get_model: Returns the model used by the agent to learn the Q-value function.

Usage Example

To use the class, you need to extend it and implement the abstract methods. Here's an example of how this might be done:

class MyAgent(Agent):
    def reset_state(self):
        # Implementation of state reset

    def _get_reward(self, action, environment):
        # Implementation of reward calculation

    def _get_model(self, state_features):
        # Implementation of the model

Once the subclass is defined, you can create an instance of the agent and use it as follows:

agent = MyAgent(num_actions=4, environment=np.array([0, 0, 0, 0]))
agent.start_episode()
for _ in range(100):
    agent.step()
agent.stop_episode()

Notes

BaseAgent is designed to be used with discrete environments and deep learning models. If you wish to use a continuous environment or a different learning model, you may need to make some modifications to the class.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddqla-0.1.25.tar.gz (123.3 kB view details)

Uploaded Source

Built Distribution

ddqla-0.1.25-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file ddqla-0.1.25.tar.gz.

File metadata

  • Download URL: ddqla-0.1.25.tar.gz
  • Upload date:
  • Size: 123.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for ddqla-0.1.25.tar.gz
Algorithm Hash digest
SHA256 c236f789a1b98ab413beaa52a46aa57a2d1e69cf2f49b964b6d8caa9907b5ed1
MD5 1929f0bc109092002f39f93cd9479c75
BLAKE2b-256 fbb7983a648f79fa3aba9ff35bcb4005a5993ea07146fbec239fc394ba131b15

See more details on using hashes here.

File details

Details for the file ddqla-0.1.25-py3-none-any.whl.

File metadata

  • Download URL: ddqla-0.1.25-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for ddqla-0.1.25-py3-none-any.whl
Algorithm Hash digest
SHA256 245f3d5db38d441297bf6d193ee36cc09ae5057e383f25609159f8ce1a96d065
MD5 c52fdda3b9ef9f8c0604fa18de58220d
BLAKE2b-256 f0a695914e45fa5c4420954b0e15619af7cda058a8d51937a5d85f17d799a7f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page