Skip to main content

A OpenAI Gym Env for continuous control

Project description

Gym-style API environment

A write up

Here's the most recent write up regarding the envoronment and algorithms applied to it.

Comments

  • in general, if we record a transition up to "Done" or if we update as soon as we reach "Done", the info collected is very little. Done is reached after 1 or 2 transitions. specify a different condition

Environment dynamics

The functions used:

  • $f_e(x^s, x^a) = \mathbb{E}[Y_e|X_e(1) = (x^s, x^a)]$: Causal mechanism determining probability of $Y_e = 1$ given $X_e(1)$. We will take $f_e(x^s, x^a) = (1 + \exp^{−x^s−x^a})^{−1}$
  • $g^a_e(\rho, x^a) \in {g : [0, 1] \times \Omega \rightarrow \Omega }$: Intervention process on $X^a$ in response to a predictive score $\rho$ updating $X^a_e(0) \rightarrow X^a_e(1)$
  • $\rho_e(x^s, x^a) \in {\rho_e : \Omega^s \times \Omega^a \rightarrow [0, 1]}$: Predictive score trained at epoch $e$

Additional information:

  • At epoch $e$, the predictive score $\rho$ uses $X^a_e(0), X^s_e(0)$ and $Y_e$ as training data; previous epochs are ignored and $X^a_e(1), X^s_e(1)$ are not observed. The predictive score is computed at time $t=0$.
  • We allow $\rho_e$ to be an arbitrary function, but generally presume it is an estimator of $\rho_e(x^s, x^a) \approx E [Y_e|X^s_e(0) = x^s, X^a_e(0) = x^a]= f_e(x^s, g^a_e(\rho_{e-1}, x^a)) \triangleq \tilde{f}_e(x^s, x^a) $
  • $\forall e f_e = E[Y_e|X_e] = E[Y_e|X_e(1)]$: $Y_e$ depends on $X_e(1)$; that is, after any potential interventions
  • a higher value $\rho$ means a larger intervention is made (we assume $g^a_e$ to be deterministic, but random valued functions may more accurately capture the uncertainty linked to real-world interventions)

Naive updating

By ‘naive’ updating it is meant that a new score $ρ_e$ is fitted in each epoch, and then used as a drop-in replacement of an existing score $ρ_{e−1}$. It leads to estimates $\rho_e(x^s, x^a)$ converging as $e \rightarrow \infty$ to a setting in which $\rho_e$ accurately estimates its own effect: conceptually, $\rho_e(x^s, x^a)$ estimates the probability of $Y$ after interventions have been made on the basis of $\rho_e(x^s, x^a)$ itself.

EPOCH 0
t=0

  • observe a population of patients $(X_0^a(0),X_0^s(0))_{i=1}^N$

t=1

  • there are no interventions, hence $X_0^a(1) = X_0^a(0)$
  • the risk of observing $Y = 1$ depends only on covariates at $t1$ through $f_0$ and is $E[Y_0|X_0(0) = (x^s, x^a)] =f(x^s, x^a)$
  • the score $\rho_0$ is therefore defined as $\rho_0(x^s, x^a) = f(x^s, x^a)$
  • $Y_0$ is observed
  • analyst decides a function $\rho_0$, which is retained into epoch 1. We will use initialized actions $\theta = (\theta^0, \theta^1, \theta^2)$

The model performance under non-intervention is equivalent to performance at epoch 0

EPOCH $>0$ t=0

  • observe a new population of patients $(X_e^a(0),X_e^s(0))_{i=1}^N$
  • analyst computes $\rho_0 (X^s_e(0), Xa_e(0))$

t=1

  • $X^s_e(0)$ is not interventionable and becomes $X^s_e(1)$
  • $\rho_0$ is used to inform interventions $g^a_e$ to change values $X^a_e(1) = g_e(\rho_{e-1}(x^s, x^a), x^a)$
  • $E[Y_1]$ is determined by covariates $X^s_e(1), X^a_e(1)$
  • the score $ρ_e$ is defined as $\rho_e(x^s, x^a) = f_e(x^s, g^a_e(\rho_{e-1}(x^s, x^a), xa)) \triangleq h(\rho_{e−1} (x^s, x^a))
  • $Y_e$ is observed
  • analyst decides a function $\rho_e$ using $X^s_e(1), X^a_e(1), Y_e$, which is retained into epoch $e+1$. We will use $\rho_e =(1 + exp^(−\theta^0 −x^s \theta^1 −x^a \beta^2 ))^{−1}$

Then the episodes repeat

state and action spaces:

Action space: 3D space $\in [-2, 2]$. Actions represent the coefficients thetas of a logistic regression that will be run on the dataset of patients

Observation space: aD space $\in [0, \infty)$. States represent values for the predictive score $f_e$

To install

To change version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gym_update-0.6.2.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

gym_update-0.6.2-py3-none-any.whl (17.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page