Skip to main content

Behavioral inference. IRL and DDC with standard errors.

Project description

econirl

Benchmarking dynamic discrete choice and inverse RL algorithms on a variety of MDPs — comparing reward recovery, imitation, and generalization.

Install

uv pip install -e .

Try It

from econirl.evaluation.benchmark import BenchmarkDGP, run_single, get_default_estimator_specs

# 5-state bus engine replacement MDP (Rust 1987)
dgp = BenchmarkDGP(n_states=5, discount_factor=0.95)
specs = get_default_estimator_specs()

# Run all 18 estimators with benchmark-tuned defaults
for spec in specs:
    result = run_single(dgp, spec, n_agents=100, n_periods=50, seed=42)
    print(f"{result.estimator:12s}  {result.pct_optimal:6.1f}%  {result.time_seconds:5.1f}s")

5-State Bus Engine Replacement MDP

Results

Estimator Category Recovers Params Recovers Reward % Optimal % Transfer Time
Structural Estimators
NFXP Structural Yes Yes 99.7% 99.8% 13.9s
CCP Structural Yes Yes 99.7% 99.8% 18.6s
SEES Structural Yes Yes 99.6% 99.6% 28.6s
NNES Structural Yes Yes 99.6% 99.1% 13.7s
Entropy-Based IRL
MCE IRL IRL Yes Yes 99.7% 99.7% 20.6s
MaxEnt IRL IRL No Yes 98.2% 97.8% 9.1s
Deep MaxEnt IRL No Yes 98.3% 98.2% 52.3s
BIRL IRL No Yes 99.5% 99.5% 237.8s
Margin-Based IRL
Max Margin IRL Yes Yes 99.3% 99.3% 64.8s
Max Margin IRL IRL No Yes 31.1% 34.2% 0.3s
Distribution Matching
f-IRL IRL No Yes 99.1% 99.1% 44.9s
Neural Estimators
TD-CCP Neural Yes Yes 99.8% 99.7% 16.3s
GLADIUS Neural Yes Yes 99.6% 88.7% 4.2s
Adversarial Methods
GAIL Adversarial No No 54.3% 50.9% 112.9s
AIRL Adversarial No Yes 99.4% 99.5% 123.0s
GCL Adversarial No Yes 92.7% 95.3% 166.5s
Inverse Q-Learning
IQ-Learn IRL No Yes 99.5% 99.1% 0.0s
Baseline
BC Baseline No No 99.5% 99.5% 0.1s

5-state MDP, 100 agents x 50 periods, seed=42. % Optimal = value achieved vs true optimal on training dynamics (baseline-normalized). % Transfer = same metric on held-out transition dynamics (same rewards, different wear rates). Recovers Params = recovers interpretable structural parameters. Recovers Reward = recovers a reward function (enables transfer to new dynamics).

Internal Validity — Policy Execution on Training Dynamics

External Validity — Policy Execution on Transfer Dynamics

Estimated vs True Rewards

Algorithms

Structural Estimators

Assume the econometrician knows the model and recover flow utility parameters by maximum likelihood.

Algorithm Paper Method
NFXP Rust (1987) Full-solution MLE via nested fixed point
CCP Hotz & Miller (1993) Two-step conditional choice probability with NPL iterations
SEES Luo & Sang (2024) Sieve basis V(s) approximation + penalized joint MLE
NNES Nguyen (2025) Neural V(s) network (Bellman residual) + structural MLE

Entropy-Based IRL

Recover reward functions from demonstrations using maximum entropy or Bayesian principles.

Algorithm Paper Method
MCE IRL Ziebart (2010) Maximum causal entropy IRL with soft value iteration
MaxEnt IRL Ziebart et al. (2008) Maximum entropy IRL with state visitation frequencies
Deep MaxEnt Wulfmeier et al. (2016) Neural reward network + MaxEnt feature matching
BIRL Ramachandran & Amir (2007) Bayesian MCMC (Metropolis-Hastings) over reward parameters

Margin-Based IRL

Recover rewards by maximizing the margin between expert and non-expert behavior.

Algorithm Paper Method
Max Margin Ratliff et al. (2006) Structured max-margin planning
Max Margin IRL Abbeel & Ng (2004) Apprenticeship learning via margin maximization

Distribution Matching

Match state-marginal distributions rather than feature expectations.

Algorithm Paper Method
f-IRL Ni et al. (2022) State-marginal matching via f-divergences (KL, chi-squared, TV)

Neural Estimators

Approximate value functions with neural networks for scalability to large state spaces.

Algorithm Paper Method
TD-CCP Adusumilli & Eckardt (2022) TD-learning + CCP with neural approximate value iteration
GLADIUS Kang, Yoganarasimhan & Jain (2025) Dual Q + EV networks with Bellman consistency penalty

Inverse Q-Learning

Recover reward and policy by learning a single soft Q-function, avoiding adversarial training.

Algorithm Paper Method
IQ-Learn Garg et al. (2021) Inverse soft-Q learning with chi-squared divergence

Adversarial Methods

Learn reward or policy via a discriminator that distinguishes expert from generated behavior.

Algorithm Paper Method
GAIL Ho & Ermon (2016) Generative adversarial imitation learning
AIRL Fu et al. (2018) Adversarial inverse RL with disentangled reward
GCL Finn et al. (2016) Guided cost learning with importance sampling

Baseline

Algorithm Paper Method
BC Supervised: empirical P(a|s) from demonstrations

Pseudocode

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

econirl-0.0.1.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

econirl-0.0.1-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file econirl-0.0.1.tar.gz.

File metadata

  • Download URL: econirl-0.0.1.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.1.tar.gz
Algorithm Hash digest
SHA256 474485d459c39df89347d765eeb6fbef7280673b60588cfca09cf2ae122a0727
MD5 e47a6670fb622a6620d4060109676500
BLAKE2b-256 6224f3d80c2b75fc83013f2c5c28daf64d01181c7a1d921ce1811074e4f0c0a9

See more details on using hashes here.

File details

Details for the file econirl-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: econirl-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2ed31816abd71c90f25f7d4623dd386e6f55c72297da89a7165c27ab40204464
MD5 5c864f6278c1e349bc6a7316f2e52c74
BLAKE2b-256 4bf6407d7b806c1b4465ccaa57f48a35ba9701640e6071f764a8f4c4c719c42a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page