Unsupervised pre-training with PPG
Project description
Unsupervised On-Policy Reinforcement Learning
This work combines Active Pre-Training with an On-Policy algorithm, Phasic Policy Gradient.
Active Pre-Training
Is used to pre-train a model free algorithm before defining a downstream task. It calculates the reward based on an estimatie of the particle based entropy of states. This reduces the training time if you want to define various tasks - i.e. robots for a warehouse.
Phasic Policy Gradient
Improved Version of Proximal Policy Optimization, which uses auxiliary epochs to train shared representations between the policy and a value network.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for unsupervised-on-policy-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | eebaa2e0c0c6d9647d60955ff64a8804b666ff0a40d9330b214f6d3c47fa2f2b |
|
MD5 | cc2690cc1b03fcf7d80facfaec0d9811 |
|
BLAKE2b-256 | c65bd41ce1ba55cdf79c2680cd2883b18a843e5d096b64f7afae84f6db066cbf |
Close
Hashes for unsupervised_on_policy-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a43aa28eaaf0c53de8e28fc1918b28002758732409dabd0f299e2ab1dc1ec3db |
|
MD5 | 02d4ee76fc8e0be8c3cda0647f5ae0dd |
|
BLAKE2b-256 | 594bdab164f48d54d2e539cc11aa5f22839386653311fbe2de064960ed5459e7 |