Behavioral inference. IRL and DDC with standard errors.

These details have not been verified by PyPI

Project links

Project description

econirl

Benchmarking dynamic discrete choice and inverse RL algorithms on a variety of MDPs — comparing reward recovery, imitation, and generalization.

Install

uv pip install -e .

Try It

from econirl.evaluation.benchmark import BenchmarkDGP, run_single, get_default_estimator_specs

# 5-state bus engine replacement MDP (Rust 1987)
dgp = BenchmarkDGP(n_states=5, discount_factor=0.95)
specs = get_default_estimator_specs()

# Run all 18 estimators with benchmark-tuned defaults
for spec in specs:
    result = run_single(dgp, spec, n_agents=100, n_periods=50, seed=42)
    print(f"{result.estimator:12s}  {result.pct_optimal:6.1f}%  {result.time_seconds:5.1f}s")

5-State Bus Engine Replacement MDP

Results

Estimator	Category	Recovers Params	Recovers Reward	% Optimal	% Transfer	Time
Structural Estimators
NFXP	Structural	Yes	Yes	99.7%	99.8%	13.9s
CCP	Structural	Yes	Yes	99.7%	99.8%	18.6s
SEES	Structural	Yes	Yes	99.6%	99.6%	28.6s
NNES	Structural	Yes	Yes	99.6%	99.1%	13.7s
Entropy-Based IRL
MCE IRL	IRL	Yes	Yes	99.7%	99.7%	20.6s
MaxEnt IRL	IRL	No	Yes	98.2%	97.8%	9.1s
Deep MaxEnt	IRL	No	Yes	98.3%	98.2%	52.3s
BIRL	IRL	No	Yes	99.5%	99.5%	237.8s
Margin-Based IRL
Max Margin	IRL	Yes	Yes	99.3%	99.3%	64.8s
Max Margin IRL	IRL	No	Yes	31.1%	34.2%	0.3s
Distribution Matching
f-IRL	IRL	No	Yes	99.1%	99.1%	44.9s
Neural Estimators
TD-CCP	Neural	Yes	Yes	99.8%	99.7%	16.3s
GLADIUS	Neural	Yes	Yes	99.6%	88.7%	4.2s
Adversarial Methods
GAIL	Adversarial	No	No	54.3%	50.9%	112.9s
AIRL	Adversarial	No	Yes	99.4%	99.5%	123.0s
GCL	Adversarial	No	Yes	92.7%	95.3%	166.5s
Inverse Q-Learning
IQ-Learn	IRL	No	Yes	99.5%	99.1%	0.0s
Baseline
BC	Baseline	No	No	99.5%	99.5%	0.1s

5-state MDP, 100 agents x 50 periods, seed=42. % Optimal = value achieved vs true optimal on training dynamics (baseline-normalized). % Transfer = same metric on held-out transition dynamics (same rewards, different wear rates). Recovers Params = recovers interpretable structural parameters. Recovers Reward = recovers a reward function (enables transfer to new dynamics).

Internal Validity — Policy Execution on Training Dynamics

External Validity — Policy Execution on Transfer Dynamics

Estimated vs True Rewards

Algorithms

Structural Estimators

Assume the econometrician knows the model and recover flow utility parameters by maximum likelihood.

Algorithm	Paper	Method
NFXP	Rust (1987)	Full-solution MLE via nested fixed point
CCP	Hotz & Miller (1993)	Two-step conditional choice probability with NPL iterations
SEES	Luo & Sang (2024)	Sieve basis V(s) approximation + penalized joint MLE
NNES	Nguyen (2025)	Neural V(s) network (Bellman residual) + structural MLE

Entropy-Based IRL

Recover reward functions from demonstrations using maximum entropy or Bayesian principles.

Algorithm	Paper	Method
MCE IRL	Ziebart (2010)	Maximum causal entropy IRL with soft value iteration
MaxEnt IRL	Ziebart et al. (2008)	Maximum entropy IRL with state visitation frequencies
Deep MaxEnt	Wulfmeier et al. (2016)	Neural reward network + MaxEnt feature matching
BIRL	Ramachandran & Amir (2007)	Bayesian MCMC (Metropolis-Hastings) over reward parameters

Margin-Based IRL

Recover rewards by maximizing the margin between expert and non-expert behavior.

Algorithm	Paper	Method
Max Margin	Ratliff et al. (2006)	Structured max-margin planning
Max Margin IRL	Abbeel & Ng (2004)	Apprenticeship learning via margin maximization

Distribution Matching

Match state-marginal distributions rather than feature expectations.

Algorithm	Paper	Method
f-IRL	Ni et al. (2022)	State-marginal matching via f-divergences (KL, chi-squared, TV)

Neural Estimators

Approximate value functions with neural networks for scalability to large state spaces.

Algorithm	Paper	Method
TD-CCP	Adusumilli & Eckardt (2022)	TD-learning + CCP with neural approximate value iteration
GLADIUS	Kang, Yoganarasimhan & Jain (2025)	Dual Q + EV networks with Bellman consistency penalty

Inverse Q-Learning

Recover reward and policy by learning a single soft Q-function, avoiding adversarial training.

Algorithm	Paper	Method
IQ-Learn	Garg et al. (2021)	Inverse soft-Q learning with chi-squared divergence

Adversarial Methods

Learn reward or policy via a discriminator that distinguishes expert from generated behavior.

Algorithm	Paper	Method
GAIL	Ho & Ermon (2016)	Generative adversarial imitation learning
AIRL	Fu et al. (2018)	Adversarial inverse RL with disentangled reward
GCL	Finn et al. (2016)	Guided cost learning with importance sampling

Baseline

Algorithm	Paper	Method
BC	—	Supervised: empirical P(a\|s) from demonstrations

Pseudocode

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.4

Apr 4, 2026

0.0.3

Apr 1, 2026

0.0.2

Mar 31, 2026

This version

0.0.1

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

econirl-0.0.1.tar.gz (2.2 MB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

econirl-0.0.1-py3-none-any.whl (1.3 MB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file econirl-0.0.1.tar.gz.

File metadata

Download URL: econirl-0.0.1.tar.gz
Upload date: Mar 31, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`474485d459c39df89347d765eeb6fbef7280673b60588cfca09cf2ae122a0727`
MD5	`e47a6670fb622a6620d4060109676500`
BLAKE2b-256	`6224f3d80c2b75fc83013f2c5c28daf64d01181c7a1d921ce1811074e4f0c0a9`

See more details on using hashes here.

File details

Details for the file econirl-0.0.1-py3-none-any.whl.

File metadata

Download URL: econirl-0.0.1-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 1.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for econirl-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ed31816abd71c90f25f7d4623dd386e6f55c72297da89a7165c27ab40204464`
MD5	`5c864f6278c1e349bc6a7316f2e52c74`
BLAKE2b-256	`4bf6407d7b806c1b4465ccaa57f48a35ba9701640e6071f764a8f4c4c719c42a`

See more details on using hashes here.

econirl 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

econirl

Install

Try It

Results

Algorithms

Structural Estimators

Entropy-Based IRL

Margin-Based IRL

Distribution Matching

Neural Estimators

Inverse Q-Learning

Adversarial Methods

Baseline

Pseudocode

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes