Skip to main content

Design-based MLE to estimate always taker, complier, defier, never taker counts.

Project description

dbmle

This package provides a design-based maximum likelihood estimate of the numbers of always takers, compliers, defiers, and never takers in the sample of people in an experiment, following Christy and Kowalski (2025) "Counting Defiers: A Design-Based Model of an Experiment Can Reveal Evidence Beyond the Average Effect". The package is compatible with experiments that use either a Bernoulli randomized design or a completely randomized design, and results are exact in both cases. This package also reports other statistics from Christy and Kowalski (2025), including credible sets. See below for instructions on how to use this package in Stata.


Installation

Open a terminal or command prompt, navigate to your project directory, and run:

pip install dbmle

This installs the package and the command-line tool dbmle.

Requirements:

  • Python ≥ 3.9
  • pip installed (usually included with Python)

Usage

The MLE can be calculated from aggregated counts $(x_{I1},x_{I0},x_{C1},x_{C0})$, which are the counts of subjects who take up in intervention $x_{I1}$, do not take up in intervention $x_{I0}$, take up in control $x_{C1}$, and do not take up in control $x_{C0}$.

1. Within Python

A sample script in Python using dbmle with aggregate counts looks as follows:

from dbmle import dbmle

# Inputs correspond to aggregated count data:
# (xI1, xI0, xC1, xC0) = (50, 11, 23, 31), from Johnson and Goldstein (2003)
res = dbmle(50, 11, 23, 31)
print(res.report())

It can also be calculated from the individual-level data using the dbmle_from_ZD command:

from dbmle import dbmle_from_ZD

# Z: randomized assignment indicator (1 = intervention, 0 = control)
# D: observed takeup indicator (1 = takeup, 0 = no takeup)
Z = [1, 1, 1, 0, 0, 0]
D = [1, 1, 0, 1, 0, 0]
# Data is equivalent to stylized example in Figure 1 of Christy and Kowalski (2025)

res = dbmle_from_ZD(Z, D)
print(res.report())

Both commands return a DBMLEResult object whose .report() method prints a formatted summary of standard statistics, and MLEs.

2. Command Line Usage

Once installed, you can also use dbmle directly in the command line. The first example Python code above would equivalently be

dbmle --xI1 50 --xI0 11 --xC1 23 --xC0 31 

Using dbmle in Stata

Below is a guide to using Python within Stata. The first step is installing Python. If you already have Python 3.9 or higher installed, skip this step.

1. Install Python

Go to https://www.python.org/downloads/ and download the latest Python installer for your operating system. Run the installer and check the option "Add Python to PATH" after running the installer.

2. Open Stata and tell Stata how to use Python

In the Stata command prompt, run

python query

You should get an output similar to

Python Settings
  set python_exec      /path/to/python
  set python_userpath  

Python system information
  initialized          no
  version              3.12.8
  architecture         64-bit
  library path         /.../lib64/libpython3.12.so.1.0

that is, set python_exec should have a valid path to Python and version should be 3.9 or greater. Once this is done, dbmle can be used directly in the command line of Stata by typing a command like

! dbmle --xI1 50 --xI0 11 --xC1 23 --xC0 31

You can also use Python in a Stata script by starting the script with python: and ending with end:

python:
    from dbmle import dbmle
    res = dbmle(50, 11, 23, 31)
    print(res.report())
end

You can find more information about using Python in Stata here: https://www.stata.com/python/


Parameters

Aside from the data input, each command supports the following parameters:

  • output: "basic", "auxiliary", or "approx"
    Default: "basic"
    What statistics are to be calculated and displayed. "basic" performs an exhaustive grid search and returns the MLE(s) along with the smallest credible set. "auxiliary" returns the statistics that "basic" returns along with the largest possible support, estimated Fréchet bounds, and the smallest credible set conditional on being within the estimated Fréchet set. "approx" uses a significantly faster approximation algorithm to calculate the MLE(s) and only returns the MLE(s). All three return a standard statistics table as well.

  • level: float
    Default: 0.95
    Smallest credible-set level (e.g. 0.95 for 95%).

  • show_progress: bool
    Default: True
    Whether to display a tqdm progress bar for the exhaustive grid search (not relevant when output="approx"). For the Tappin et al. (2015) data, a sample of 612 where $(x_{I1},x_{I0},x_{C1},x_{C0})=(69, 237, 26, 280)$, the tqdm bar looks as follows:

      Enumerating Joint Distributions:   7%|▋         | 2814961/38579155 [00:36<09:53, 60237.47Joint Distribution/s]
    

    The 7%|▋ | is a visual indicator of how many distributions in the grid have been evaluated so far. The fraction 2814961/38579155 tells you exactly how many distributions have been calculated over how many distributions total there are to calculate. The times 00:36<09:53 tell you the time elapsed so far and the predicted time left to complete the grid search, so in this case, the total run time is expected to be about 10.5 minutes. Finally, 60237.47Joint Distribution/s tells you how many distributions can be calculated in a second (the predicted time left is based on the history of this value).

Note that there is no parameter for the design of the randomization (Bernoulli randomized or completely randomized) since the results are the same, as their respective design-based likelihoods are proportional to each other.


Note on Approximation

When method="approx", the package estimates MLEs using a fast local search rather than testing every possible distribution of always takers, compliers, defiers, and never takers. For all outcomes resulting from experiments with an equal number of individuals in intervention and control up to a sample size of 200, we have verified the approximation is correct.

The approximation begins by considering three candidate joint distributions:

  • The endpoint of the estimated Fréchet set with the highest likelihood.
  • A joint distribution with only always takers and never takers, so $x_{I1}+x_{C1}$ always takers and $x_{I0}+x_{C0}$ never takers.
  • A joint distribution with only compliers and defiers, so $x_{I1}+x_{C0}$ compliers and $x_{I0}+x_{C1}$ defiers.

If either of the two-type joint distributions attains the highest likelihood, it is immediately returned as the MLE. Otherwise, the algorithm performs a local cube search. In this case, we search a small four-dimensional integer cube around the endpoint of the estimated Fréchet set with the highest likelihood. The width of the search increases with sample size. If the distribution with the highest likelihood lies on the boundary of the cube, we search a slightly larger cube.


Example Usages

To get the basic output, in Python, you can run

from dbmle import dbmle

# (xI1, xI0, xC1, xC0) = (50, 11, 23, 31)
res = dbmle(50, 11, 23, 31)
print(res.report())

or equivalently in Stata,

python:
    from dbmle import dbmle
    res = dbmle(50, 11, 23, 31)
    print(res.report())
end

and in the command line interface,

dbmle --xI1 50 --xI0 11 --xC1 23 --xC0 31

All will give the same output:

Standard Statistics
----------------------------------------------
Average Effect              50/61 - 23/54 = 39.37%
95% Confidence Interval     [23.03%, 55.72%]
Fisher's Exact Test p-value 1.552e-05
Intervention Takeup Rate    50/61 = 81.97%
Control Takeup Rate         23/54 = 42.59%
Sample Size                  115


Christy and Kowalski Design-Based Maximum Likelihood Estimates
------------------------------------------------------------------------------------
Always takers
  MLE: 28/115 = 24.35%
  95% Smallest Credible Set: [0,63]/115 = [0.00%, 54.78%]

Compliers
  MLE: 66/115 = 57.39%
  95% Smallest Credible Set: [23,81]/115 = [20.00%, 70.43%]

Defiers
  MLE: 21/115 = 18.26%
  95% Smallest Credible Set: [0,34]/115 = [0.00%, 29.57%]

Never takers
  MLE: 0/115 = 0.00%
  95% Smallest Credible Set: [0,32]/115 = [0.00%, 27.83%]

To get the auxiliary table output, you can run

from dbmle import dbmle

res = dbmle(50, 11, 23, 31, output="auxiliary")
print(res.report())

or in Stata,

python:
    from dbmle import dbmle
    res = dbmle(50, 11, 23, 31, output="auxiliary")
    print(res.report())
end

or in the command line interface

dbmle --xI1 50 --xI0 11 --xC1 23 --xC0 31 --output auxiliary

which will all yield the same output:

Standard Statistics
----------------------------------------------
Average Effect              50/61 - 23/54 = 39.37%
95% Confidence Interval     [23.03%, 55.72%]
Fisher's Exact Test p-value 1.552e-05
Intervention Takeup Rate    50/61 = 81.97%
Control Takeup Rate         23/54 = 42.59%
Sample Size                  115

Christy and Kowalski Design-Based Maximum Likelihood Estimates and Auxiliary Statistics
------------------------------------------------------------------------------------
Always takers
  MLE: 28/115 = 24.35%
  95% Smallest Credible Set: [0,63]/115 = [0.00%, 54.78%]
  Largest Possible Support: [0,73]/115 = [0.00%, 63.48%]
  Estimated Frechet Bounds: [28,49]/115 = [24.35%, 42.61%]
  95% SCS within Est. Frechet: [28,39]/115 U [41,49]/115 = [24.35%, 33.91%] U [35.65%, 42.61%]

Compliers
  MLE: 66/115 = 57.39%
  95% Smallest Credible Set: [23,81]/115 = [20.00%, 70.43%]
  Largest Possible Support: [0,81]/115 = [0.00%, 70.43%]
  Estimated Frechet Bounds: [45,66]/115 = [39.13%, 57.39%]
  95% SCS within Est. Frechet: [45,53]/115 U [55,66]/115 = [39.13%, 46.09%] U [47.83%, 57.39%]

Defiers
  MLE: 21/115 = 18.26%
  95% Smallest Credible Set: [0,34]/115 = [0.00%, 29.57%]
  Largest Possible Support: [0,34]/115 = [0.00%, 29.57%]
  Estimated Frechet Bounds: [0,21]/115 = [0.00%, 18.26%]
  95% SCS within Est. Frechet: [0,8]/115 U [10,21]/115 = [0.00%, 6.96%] U [8.70%, 18.26%]

Never takers
  MLE: 0/115 = 0.00%
  95% Smallest Credible Set: [0,32]/115 = [0.00%, 27.83%]
  Largest Possible Support: [0,42]/115 = [0.00%, 36.52%]
  Estimated Frechet Bounds: [0,21]/115 = [0.00%, 18.26%]
  95% SCS within Est. Frechet: [0,11]/115 U [13,21]/115 = [0.00%, 9.57%] U [11.30%, 18.26%]

Replace "auxiliary" with "approx" to use the fast MLE approximation, which will return the output

Standard Statistics
----------------------------------------------
Average Effect              50/61 - 23/54 = 39.37%
95% Confidence Interval     [23.03%, 55.72%]
Fisher's Exact Test p-value 1.552e-05
Intervention Takeup Rate    50/61 = 81.97%
Control Takeup Rate         23/54 = 42.59%
Sample Size                  115


Christy and Kowalski Design-Based Maximum Likelihood Estimates*
------------------------------------------------------------------------------------
Always takers
  MLE: 28/115 = 24.35%

Compliers
  MLE: 66/115 = 57.39%

Defiers
  MLE: 21/115 = 18.26%

Never takers
  MLE: 0/115 = 0.00%

* MLE estimates obtained from an approximation algorithm implemented 
by the dbmle package (Christy, Kowalski, and Zhang 2025)

Project Links


Citation

If you use dbmle in your academic work, please cite Christy and Kowalski (2025) along with this package:

@misc{christy2025dbmle,
  author       = {Christy, Neil and Kowalski, Amanda and Zhang, Shuheng},
  title        = {dbmle: Design-Based Maximum Likelihood Estimation for Always Takers, Compliers, Defiers, and Never Takers},
  year         = {2025},
  howpublished = {\url{https://pypi.org/project/dbmle/}},
  note         = {Python package version 0.0.2}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbmle-0.0.2.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbmle-0.0.2-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file dbmle-0.0.2.tar.gz.

File metadata

  • Download URL: dbmle-0.0.2.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for dbmle-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ba3a566af60a590e4aa4c394d71a93057335d7c8c28bcbc24a932ae6f02a53e9
MD5 464a12852df2ad43cc00e9086627cec3
BLAKE2b-256 e98a0b1469c545a1b0c94fdefc42c4a23472ee53bfea004dd7cbab5c6e24454b

See more details on using hashes here.

File details

Details for the file dbmle-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: dbmle-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for dbmle-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 308c4613a01b0dd0bbecaa4bcd7a5f60063c250f67952e1647a61b3cd8f57e39
MD5 ad524e780ef1895cd37222ce82260eea
BLAKE2b-256 a4f2596038f66c716d80ff2fd678189d0c8529b674574da34c84ca7d29e05c25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page