Skip to main content

Behavioral inference. IRL and DDC with standard errors.

Project description

econirl

Benchmarking dynamic discrete choice and inverse RL algorithms on a variety of MDPs — comparing reward recovery, imitation, and generalization.

Install

pip install econirl

Try It

Load a bundled dataset and fit one estimator in under a second.

from econirl.datasets import load_rust_bus
from econirl.estimation import BehavioralCloningEstimator

df = load_rust_bus()
print(df.head())
print(f"{len(df)} observations, {df['bus_id'].nunique()} buses")

Benchmark

Compare 10 estimators on a simulated 5-state bus engine replacement MDP. This runs each estimator sequentially and takes a few minutes.

from econirl.evaluation.benchmark import BenchmarkDGP, run_single, get_default_estimator_specs

dgp = BenchmarkDGP(n_states=5, discount_factor=0.95)
specs = get_default_estimator_specs()

for spec in specs:
    result = run_single(dgp, spec, n_agents=100, n_periods=50, seed=42)
    print(f"{result.estimator:12s}  {result.pct_optimal:6.1f}%  {result.time_seconds:5.1f}s")

5-State Bus Engine Replacement MDP

Results

The table below shows all 18 algorithms (10 default from get_default_estimator_specs() plus 8 additional from econirl.contrib).

Estimator Category Recovers Params Recovers Reward % Optimal % Transfer Time
Structural Estimators
NFXP Structural Yes Yes 99.7% 99.8% 13.9s
CCP Structural Yes Yes 99.7% 99.8% 18.6s
SEES Structural Yes Yes 99.6% 99.6% 28.6s
NNES Structural Yes Yes 99.6% 99.1% 13.7s
Entropy-Based IRL
MCE IRL IRL Yes Yes 99.7% 99.7% 20.6s
MaxEnt IRL IRL No Yes 98.2% 97.8% 9.1s
Deep MaxEnt IRL No Yes 98.3% 98.2% 52.3s
BIRL IRL No Yes 99.5% 99.5% 237.8s
Margin-Based IRL
Max Margin IRL Yes Yes 99.3% 99.3% 64.8s
Max Margin IRL IRL No Yes 31.1% 34.2% 0.3s
Distribution Matching
f-IRL IRL No Yes 99.1% 99.1% 44.9s
Neural Estimators
TD-CCP Neural Yes Yes 99.8% 99.7% 16.3s
GLADIUS Neural Yes Yes 99.6% 88.7% 4.2s
Adversarial Methods
GAIL Adversarial No No 54.3% 50.9% 112.9s
AIRL Adversarial No Yes 99.4% 99.5% 123.0s
GCL Adversarial No Yes 92.7% 95.3% 166.5s
Inverse Q-Learning
IQ-Learn IRL No Yes 99.5% 99.1% 0.0s
Baseline
BC Baseline No No 99.5% 99.5% 0.1s

5-state MDP, 100 agents x 50 periods, seed=42. % Optimal = value achieved vs true optimal on training dynamics (baseline-normalized). % Transfer = same metric on held-out transition dynamics (same rewards, different wear rates). Recovers Params = recovers interpretable structural parameters. Recovers Reward = recovers a reward function (enables transfer to new dynamics).

Internal Validity — Policy Execution on Training Dynamics

External Validity — Policy Execution on Transfer Dynamics

Estimated vs True Rewards

Algorithms

Structural Estimators

Assume the econometrician knows the model and recover flow utility parameters by maximum likelihood.

Algorithm Paper Method
NFXP Rust (1987) Full-solution MLE via nested fixed point
CCP Hotz & Miller (1993) Two-step conditional choice probability with NPL iterations
SEES Luo & Sang (2024) Sieve basis V(s) approximation + penalized joint MLE
NNES Nguyen (2025) Neural V(s) network (Bellman residual) + structural MLE

Entropy-Based IRL

Recover reward functions from demonstrations using maximum entropy or Bayesian principles.

Algorithm Paper Method
MCE IRL Ziebart (2010) Maximum causal entropy IRL with soft value iteration
MaxEnt IRL Ziebart et al. (2008) Maximum entropy IRL with state visitation frequencies
Deep MaxEnt Wulfmeier et al. (2016) Neural reward network + MaxEnt feature matching
BIRL Ramachandran & Amir (2007) Bayesian MCMC (Metropolis-Hastings) over reward parameters

Margin-Based IRL

Recover rewards by maximizing the margin between expert and non-expert behavior.

Algorithm Paper Method
Max Margin Ratliff et al. (2006) Structured max-margin planning
Max Margin IRL Abbeel & Ng (2004) Apprenticeship learning via margin maximization

Distribution Matching

Match state-marginal distributions rather than feature expectations.

Algorithm Paper Method
f-IRL Ni et al. (2022) State-marginal matching via f-divergences (KL, chi-squared, TV)

Neural Estimators

Approximate value functions with neural networks for scalability to large state spaces.

Algorithm Paper Method
TD-CCP Adusumilli & Eckardt (2022) TD-learning + CCP with neural approximate value iteration
GLADIUS Kang, Yoganarasimhan & Jain (2025) Dual Q + EV networks with Bellman consistency penalty

Inverse Q-Learning

Recover reward and policy by learning a single soft Q-function, avoiding adversarial training.

Algorithm Paper Method
IQ-Learn Garg et al. (2021) Inverse soft-Q learning with chi-squared divergence

Adversarial Methods

Learn reward or policy via a discriminator that distinguishes expert from generated behavior.

Algorithm Paper Method
GAIL Ho & Ermon (2016) Generative adversarial imitation learning
AIRL Fu et al. (2018) Adversarial inverse RL with disentangled reward
GCL Finn et al. (2016) Guided cost learning with importance sampling

Baseline

Algorithm Paper Method
BC Supervised: empirical P(a|s) from demonstrations

Pseudocode

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

econirl-0.0.2.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

econirl-0.0.2-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file econirl-0.0.2.tar.gz.

File metadata

  • Download URL: econirl-0.0.2.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b78143f12f1ce4a0a646a0c96469d531915a0f2d93fb38f34db58a4c45bb2f14
MD5 2f871cb40eea6aaec118a71eec2f7474
BLAKE2b-256 5e73fa2b9bfe66659f30ee98a66b37d1f8d3d3e844eca3db5be0464c90c74610

See more details on using hashes here.

File details

Details for the file econirl-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: econirl-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7e481907014c2cafc3706481fee3ed47e7ce9587889d237c1f7526ac81c1199e
MD5 226d5e13a4b21f4c3e7b871bbd076241
BLAKE2b-256 4a74fd8c0c4047224d187c0014159b1ebff0a37877ae157624d0cc68b2fe81e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page