Skip to main content

A Protocol for data-generating processes; minimal interface for analog-estimation toolkits.

Project description

A minimal Python Protocol for data-generating processes (DGPs).

What this is

A Protocol (DataGeneratingProcess) with two members – data (a frozen property returning the observed realization) and draw(size=..., *, rng=...) (a method returning a fresh realization) – plus a small set of composition primitives (TwoStageDGP, with_data) and thin convenience wrappers (EmpiricalDGP, ParametricDGP) for working with DGPs as first-class objects.

The package is not a library of working DGPs. Concrete DGPs live in consumer packages – e.g.
ManifoldGMM ships its own moment-side DGPs. The role of DGP_Protocol is to define the contract that lets such consumers interoperate.

Conceptual lineage

The Protocol promotes the stand-in distribution from Manski's analog estimation framework (Manski 1988, Analog Estimation Methods in Econometrics) to a first-class Python object. In that framework, an estimator is defined by a population functional plus a sample-based stand-in for the population; DataGeneratingProcess is that stand-in. Different stand-ins yield different analog estimators:

  • The empirical distribution -> nonparametric plug-in estimators.
  • A parametric family fitted to the data -> MLE-style estimators.
  • A bootstrap distribution -> bootstrap inference.
  • A null-imposed restriction -> constrained estimators.

Installation

pip install DGP_Protocol

The import path is PEP-8 lowercase:

from dgp_protocol import DataGeneratingProcess, EmpiricalDGP, TwoStageDGP

Minimal example

import numpy as np
from dgp_protocol import EmpiricalDGP

data = np.random.default_rng(0).standard_normal(size=(100, 3))

# The DGP owns its own RNG.  Pass `seed` for reproducibility;
# `draw()` itself takes no `rng` argument.
dgp = EmpiricalDGP(observation=data, seed=1)
print(dgp.data.shape)                  # (100, 3) -- the frozen realization
print(dgp.draw().shape)                # (100, 3) -- a fresh bootstrap resample

# Rebind to a different realization while keeping the distributional
# structure.  The child gets an independent (spawned) Generator.
fresh = dgp.with_data(np.random.default_rng(2).standard_normal(size=(50, 3)))
print(fresh.data.shape)                # (50, 3)

For more substantial examples – parametric DGPs, two-stage composition (hierarchical sampling), cluster-block bootstrap – see the test suite under tests/.

Design

The design is intentionally minimal: data + draw are the only required members. Composition primitives (TwoStageDGP, with_data) take DGPs and return DGPs without expanding the Protocol.

The design note that motivated this package lives in the sibling ManifoldGMM repo at docs/design/dgp.org – DGPProtocol was extracted from that design conversation. See also AGENTS.md for the package's scope discipline and the list of intentionally deferred features.

How to cite

If you use DGPProtocol in academic work, please cite it. The repository's CITATION.cff is recognised by GitHub and provides one-click citation export in APA, BibTeX, and other formats from the repo's main page.

A BibTeX entry suitable for paper drafts:

@software{ligon_dgp_protocol_2026,
  author    = {Ligon, Ethan},
  title     = {DGP\_Protocol: A Protocol for data-generating processes},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ligon/DGP_Protocol},
  version   = {0.1.0a0},
  license   = {BSD-3-Clause},
}

License

BSD 3-Clause (BSD-3-Clause). See the LICENSE file at the root of this repository. In short: permissive use including commercial, modification, and redistribution; preserve the copyright notice and license text in redistributions; no use of the author's name to endorse derived products.

Author

Ethan Ligon, UC Berkeley.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgp_protocol-0.1.0a0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgp_protocol-0.1.0a0-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file dgp_protocol-0.1.0a0.tar.gz.

File metadata

  • Download URL: dgp_protocol-0.1.0a0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dgp_protocol-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 be16c902587df6df56f9a61efe4ab49850b65f3f3c66fd9ff40f7d693402ceb3
MD5 69c8e9a769aa9dc62c24b1344df1b78f
BLAKE2b-256 4dee2d4b668ea14de60c53e39b77335165dc51e92184c32cda2dc23a0bc12f6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for dgp_protocol-0.1.0a0.tar.gz:

Publisher: publish.yml on ligon/DGP_Protocol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dgp_protocol-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: dgp_protocol-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dgp_protocol-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 263600c6f7c1c069ef05f8a15ceb2c79c575ba6d330a8751ce1af03f649ec165
MD5 f01e302e8d349606660e9d3beeeae5ef
BLAKE2b-256 5509f103af7e7cde4b8d3660abe4019ad647e97b4b18b68be9333d39550f20c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dgp_protocol-0.1.0a0-py3-none-any.whl:

Publisher: publish.yml on ligon/DGP_Protocol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page