Skip to main content

Behavioral inference. IRL and DDC with standard errors.

Project description

econirl

Benchmarking dynamic discrete choice and inverse RL algorithms on a variety of MDPs — comparing reward recovery, imitation, and generalization.

Install

pip install econirl

Try It

Load a bundled dataset and fit one estimator in under a second.

from econirl.datasets import load_rust_bus
from econirl import CCP

df = load_rust_bus()
model = CCP(n_states=90, discount=0.9999)
model.fit(df, state="mileage_bin", action="replaced", id="bus_id")
print(model.params_)
print(model.summary())

Benchmark

Compare 10 estimators on a simulated 5-state bus engine replacement MDP. This runs each estimator sequentially and takes a few minutes.

from econirl.evaluation.benchmark import BenchmarkDGP, run_single, get_default_estimator_specs

dgp = BenchmarkDGP(n_states=5, discount_factor=0.95)
specs = get_default_estimator_specs()

for spec in specs:
    result = run_single(dgp, spec, n_agents=100, n_periods=50, seed=42)
    print(f"{result.estimator:12s}  {result.pct_optimal:6.1f}%  {result.time_seconds:5.1f}s")

5-State Bus Engine Replacement MDP

Results

The table below shows all 18 algorithms (10 default from get_default_estimator_specs() plus 8 additional from econirl.contrib).

Estimator Category Recovers Params Recovers Reward % Optimal % Transfer Time
Structural Estimators
NFXP Structural Yes Yes 99.7% 99.8% 13.9s
CCP Structural Yes Yes 99.7% 99.8% 18.6s
SEES Structural Yes Yes 99.6% 99.6% 28.6s
NNES Structural Yes Yes 99.6% 99.1% 13.7s
Entropy-Based IRL
MCE IRL IRL Yes Yes 99.7% 99.7% 20.6s
MaxEnt IRL IRL No Yes 98.2% 97.8% 9.1s
Deep MaxEnt IRL No Yes 98.3% 98.2% 52.3s
BIRL IRL No Yes 99.5% 99.5% 237.8s
Margin-Based IRL
Max Margin IRL Yes Yes 99.3% 99.3% 64.8s
Max Margin IRL IRL No Yes 31.1% 34.2% 0.3s
Distribution Matching
f-IRL IRL No Yes 99.1% 99.1% 44.9s
Neural Estimators
TD-CCP Neural Yes Yes 99.8% 99.7% 16.3s
GLADIUS Neural Yes Yes 99.6% 88.7% 4.2s
Adversarial Methods
GAIL Adversarial No No 54.3% 50.9% 112.9s
AIRL Adversarial No Yes 99.4% 99.5% 123.0s
GCL Adversarial No Yes 92.7% 95.3% 166.5s
Inverse Q-Learning
IQ-Learn IRL No Yes 99.5% 99.1% 0.0s
Baseline
BC Baseline No No 99.5% 99.5% 0.1s

5-state MDP, 100 agents x 50 periods, seed=42. % Optimal = value achieved vs true optimal on training dynamics (baseline-normalized). % Transfer = same metric on held-out transition dynamics (same rewards, different wear rates). Recovers Params = recovers interpretable structural parameters. Recovers Reward = recovers a reward function (enables transfer to new dynamics).

Internal Validity — Policy Execution on Training Dynamics

External Validity — Policy Execution on Transfer Dynamics

Estimated vs True Rewards

Algorithms

Structural Estimators

Assume the econometrician knows the model and recover flow utility parameters by maximum likelihood.

Algorithm Paper Method
NFXP Rust (1987) Full-solution MLE via nested fixed point
CCP Hotz & Miller (1993) Two-step conditional choice probability with NPL iterations
SEES Luo & Sang (2024) Sieve basis V(s) approximation + penalized joint MLE
NNES Nguyen (2025) Neural V(s) network (NPL Bellman) + structural MLE

Entropy-Based IRL

Recover reward functions from demonstrations using maximum entropy or Bayesian principles.

Algorithm Paper Method
MCE IRL Ziebart (2010) Maximum causal entropy IRL with soft value iteration
MaxEnt IRL Ziebart et al. (2008) Maximum entropy IRL with state visitation frequencies
Deep MaxEnt Wulfmeier et al. (2016) Neural reward network + MaxEnt feature matching
BIRL Ramachandran & Amir (2007) Bayesian MCMC (Metropolis-Hastings) over reward parameters

Margin-Based IRL

Recover rewards by maximizing the margin between expert and non-expert behavior.

Algorithm Paper Method
Max Margin Ratliff et al. (2006) Structured max-margin planning
Max Margin IRL Abbeel & Ng (2004) Apprenticeship learning via margin maximization

Distribution Matching

Match state-marginal distributions rather than feature expectations.

Algorithm Paper Method
f-IRL Ni et al. (2022) State-marginal matching via f-divergences (KL, chi-squared, TV)

Neural Estimators

Approximate value functions with neural networks for scalability to large state spaces.

Algorithm Paper Method
TD-CCP Adusumilli & Eckardt (2022) TD-learning + CCP with neural approximate value iteration
GLADIUS Kang, Yoganarasimhan & Jain (2025) Dual Q + EV networks with Bellman consistency penalty

Inverse Q-Learning

Recover reward and policy by learning a single soft Q-function, avoiding adversarial training.

Algorithm Paper Method
IQ-Learn Garg et al. (2021) Inverse soft-Q learning with chi-squared divergence

Adversarial Methods

Learn reward or policy via a discriminator that distinguishes expert from generated behavior.

Algorithm Paper Method
GAIL Ho & Ermon (2016) Generative adversarial imitation learning
AIRL Fu et al. (2018) Adversarial inverse RL with disentangled reward
GCL Finn et al. (2016) Guided cost learning with importance sampling

Baseline

Algorithm Paper Method
BC Supervised: empirical P(a|s) from demonstrations

Pseudocode

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

econirl-0.0.4.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

econirl-0.0.4-py3-none-any.whl (6.2 MB view details)

Uploaded Python 3

File details

Details for the file econirl-0.0.4.tar.gz.

File metadata

  • Download URL: econirl-0.0.4.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.4.tar.gz
Algorithm Hash digest
SHA256 124f780f65a563bddfac5cc20e2390672b2b516ebe50245c9e8a3a79bb404b39
MD5 c187f372ef0deba9d4b827a2465d5321
BLAKE2b-256 d695327ce0741ae3cbe24b16fa354d5c4025b133e53299a89e147639f9bedcd1

See more details on using hashes here.

File details

Details for the file econirl-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: econirl-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 53295c719c3a447346e133960b0533f10bdf131e47d3c56378ee9dae3e79aca3
MD5 737c8b5e238556b86a1b187d1c57baa8
BLAKE2b-256 d82fda4bd6e146e50ae8263f6077a2573d6fdc77f7d93df7b5f26d2779fd96ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page