A package to create representative microdata for the US.

These details have not been verified by PyPI

Project description

PolicyEngine US Data

Installation

While it is possible to install via PyPi:

pip install policyengine-us-data

the recommended installation is

pip install -e .[dev]

which installs the development dependencies in a reference-only manner (so that changes to the package code will be reflected immediately); policyengine-us-data is a dev package and not intended for direct access.

Pull Requests

PRs must come from branches pushed to PolicyEngine/policyengine-us-data, not from personal forks. The PR workflow hard-fails fork-based PRs before the real test suite runs because the required secrets are unavailable there.

Before opening a PR, push the current branch to the upstream repo:

make push-pr-branch

That target pushes the current branch to the upstream remote and sets tracking so gh pr create opens the PR from PolicyEngine/policyengine-us-data.

SSA Data Sources

The following SSA data sources are used in this project:

Latest Trustee's Report (2025) - Source for social_security_aux.csv (extracted via extract_ssa_costs.py)
Single Year Supplementary Tables (2025) - Long-range demographic and economic projections
Single Year Age Demographic Projections (2024 - latest published) - Source for SSPopJul_TR2024.csv population data

Pipeline Overview

PolicyEngine constructs its representative household datasets through a multi-step pipeline. Public survey data is merged, stratified, and cloned to geographic variants per household. Each clone is simulated through PolicyEngine US with stochastic take-up, then calibrated via L0-regularized optimization against administrative targets at the national, state, and congressional district levels, producing geographically representative datasets.

The Enhanced CPS (make data-legacy) produces a national-only calibrated dataset. For the current geography-specific pipeline, see docs/calibration.md.

The repo currently contains two calibration tracks:

Legacy Enhanced CPS (make data-legacy), which uses the older EnhancedCPS / build_loss_matrix() path for national-only calibration.
Unified calibration (docs/calibration.md), which uses storage/calibration/policy_data.db and the sparse matrix + L0 pipeline for current national and geography-specific builds.

For detailed calibration usage, see docs/calibration.md and modal_app/README.md.

Running the Full Pipeline

The pipeline runs as sequential steps in Modal:

make pipeline   # prints the steps below

# 1. Build data (CPS/PUF/ACS → source-imputed stratified CPS)
make build-data-modal

# 2. Build calibration matrices (CPU, ~10h)
make build-matrices

# 3. Fit weights (GPU, county + national in parallel)
make calibrate-both

# 4. Build H5 files (state/district/city + national in parallel)
make stage-all-h5s

# 5. Promote to versioned HF paths
make promote

Building the Paper

Prerequisites

The paper requires a LaTeX distribution (e.g., TeXLive or MiKTeX) with the following packages:

graphicx (for figures)
amsmath (for mathematical notation)
natbib (for bibliography management)
hyperref (for PDF links)
booktabs (for tables)
geometry (for page layout)
microtype (for typography)
xcolor (for colored links)

On Ubuntu/Debian, you can install these with:

sudo apt-get install texlive-latex-base texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended

On macOS with Homebrew:

brew install --cask mactex

Building

To build the paper:

make paper

To clean LaTeX build files:

make clean-paper

The output PDF will be at paper/main.pdf.

Building the Documentation

Prerequisites

The documentation uses Jupyter Book 2 (pre-release) with MyST. To install:

# Install Jupyter Book 2 pre-release
pip install --pre "jupyter-book==2.*"

# Install MyST CLI
npm install -g mystmd

Building

To build and serve the documentation locally:

cd docs
myst start

Or alternatively from the project root:

jupyter book start docs

Both commands will start a local server at http://localhost:3001 where you can view the documentation.

The legacy Makefile command:

make documentation

Note: The Makefile uses the older jb command syntax which may not work with Jupyter Book 2. Use myst start or jupyter book start docs instead.

TRACE provenance output

Each US data release now publishes both:

release_manifest.json
trace.tro.jsonld

The release manifest remains the operational source of truth for:

published artifact paths and checksums
build IDs and timestamps
build-time policyengine-us provenance

trace.tro.jsonld is a generated TRACE declaration built from that manifest. It gives a standards-based provenance export over the same release artifacts, including a composition fingerprint across the release manifest and the artifacts it describes.

The TRO uses the canonical TROv 0.1 vocabulary and surfaces PolicyEngine-specific build provenance under the https://policyengine.org/trace/0.1# extension namespace. Structured fields on the performance node (pe:dataBuildFingerprint, pe:builtWithModelVersion, pe:builtWithModelGitSha, pe:dataBuildId, pe:emittedIn) let a verifier cross-check this TRO against the certified-bundle TRO emitted by policyengine.py without parsing prose.

The emitted TRO is validated against policyengine_us_data/schemas/trace_tro.schema.json.

Important boundary:

the TRACE file does not replace the release manifest
the TRACE file does not decide model/data compatibility

For the broader certified-bundle architecture, see policyengine.py release bundles and the official TRACE specification.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.90.0

May 1, 2026

1.89.1

Apr 30, 2026

1.89.0

Apr 30, 2026

This version

1.88.3

Apr 29, 2026

1.88.2

Apr 29, 2026

1.88.1

Apr 28, 2026

1.88.0

Apr 25, 2026

1.87.0

Apr 25, 2026

1.86.2

Apr 24, 2026

1.86.1

Apr 23, 2026

1.86.0

Apr 21, 2026

1.85.2

Apr 21, 2026

1.85.1

Apr 21, 2026

1.85.0

Apr 21, 2026

1.83.4

Apr 18, 2026

1.83.3

Apr 18, 2026

1.83.2

Apr 17, 2026

1.83.1

Apr 17, 2026

1.83.0

Apr 17, 2026

1.82.0

Apr 17, 2026

1.81.1

Apr 17, 2026

1.81.0

Apr 17, 2026

1.80.0

Apr 17, 2026

1.79.8

Apr 17, 2026

1.79.7

Apr 17, 2026

1.79.6

Apr 17, 2026

1.79.5

Apr 17, 2026

1.79.4

Apr 17, 2026

1.79.3

Apr 16, 2026

1.79.2

Apr 14, 2026

1.79.1

Apr 14, 2026

1.79.0

Apr 14, 2026

1.78.4

Apr 13, 2026

1.78.3

Apr 12, 2026

1.78.2

Apr 12, 2026

1.78.1

Apr 12, 2026

1.78.0

Apr 12, 2026

1.77.0

Apr 10, 2026

1.76.0

Apr 10, 2026

1.75.8

Apr 10, 2026

1.75.7

Apr 10, 2026

1.75.6

Apr 9, 2026

1.75.5

Apr 9, 2026

1.75.4

Apr 9, 2026

1.75.3

Apr 9, 2026

1.75.2

Apr 9, 2026

1.75.1

Apr 8, 2026

1.75.0

Apr 8, 2026

1.74.3

Apr 7, 2026

1.74.2

Apr 3, 2026

1.74.1

Apr 3, 2026

1.74.0

Apr 2, 2026

1.73.0

Mar 12, 2026

1.71.4

Mar 4, 2026

1.71.3

Mar 4, 2026

1.71.2

Mar 4, 2026

1.71.1

Mar 4, 2026

1.70.0

Feb 26, 2026

1.69.3

Feb 19, 2026

1.69.1

Feb 19, 2026

1.69.0

Feb 18, 2026

1.68.0

Feb 17, 2026

1.67.0

Feb 12, 2026

1.66.0

Feb 12, 2026

1.65.0

Feb 12, 2026

1.64.1

Feb 9, 2026

1.64.0

Feb 8, 2026

1.63.1

Feb 8, 2026

1.63.0

Feb 7, 2026

1.62.0

Feb 7, 2026

1.61.2

Feb 1, 2026

1.61.0

Jan 31, 2026

1.60.0

Jan 31, 2026

1.59.0

Jan 31, 2026

1.58.0

Jan 31, 2026

1.56.0

Jan 26, 2026

1.55.0

Jan 26, 2026

1.51.1

Jan 7, 2026

1.50.0

Dec 23, 2025

1.49.0

Dec 19, 2025

1.48.0

Dec 8, 2025

1.47.1

Dec 3, 2025

1.46.0

Sep 10, 2025

1.45.0

Aug 20, 2025

1.44.2

Aug 8, 2025

1.44.1

Aug 8, 2025

1.44.0

Aug 6, 2025

1.43.0

Aug 4, 2025

1.42.6

Aug 1, 2025

1.42.5

Jul 30, 2025

1.42.4

Jul 30, 2025

1.42.0

Jul 28, 2025

1.41.2

Jul 26, 2025

1.17.0

Jan 24, 2025

1.16.1

Jan 22, 2025

1.16.0

Jan 13, 2025

1.15.1

Dec 3, 2024

1.15.0

Dec 2, 2024

1.14.0

Nov 29, 2024

1.13.0

Nov 19, 2024

1.12.1

Nov 12, 2024

1.12.0

Nov 12, 2024

1.11.1

Oct 29, 2024

1.11.0

Oct 9, 2024

1.10.0

Oct 8, 2024

1.9.0

Oct 7, 2024

1.8.0

Sep 29, 2024

1.6.0

Sep 25, 2024

1.5.1

Sep 23, 2024

1.4.5

Sep 22, 2024

1.4.4

Sep 19, 2024

1.4.1

Sep 18, 2024

1.4.0

Sep 18, 2024

1.3.1

Sep 17, 2024

1.3.0

Sep 17, 2024

1.2.1

Sep 16, 2024

1.2.0

Sep 16, 2024

1.1.1

Sep 11, 2024

1.1.0

Sep 11, 2024

1.0.1

Sep 10, 2024

1.0.0.post1.dev8 pre-release

Sep 7, 2024

1.0.0.post1.dev6 pre-release

Sep 7, 2024

1.0.0.post1.dev3 pre-release

Sep 7, 2024

1.0.0

Sep 6, 2024

0.1.0

Sep 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

policyengine_us_data-1.88.3.tar.gz (55.6 MB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

policyengine_us_data-1.88.3-py3-none-any.whl (48.3 MB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file policyengine_us_data-1.88.3.tar.gz.

File metadata

Download URL: policyengine_us_data-1.88.3.tar.gz
Upload date: Apr 29, 2026
Size: 55.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for policyengine_us_data-1.88.3.tar.gz
Algorithm	Hash digest
SHA256	`feef5f9b8eb072c360161b6aa3af7d36c2132c002b1c96f8c75f0664d73ad4fc`
MD5	`f5ee5a124e46294a70bcd599420eacef`
BLAKE2b-256	`b10ee6c1eb3d610b16413e66bfaf1a304e91330dbfcae8de6dcb288603522561`

See more details on using hashes here.

File details

Details for the file policyengine_us_data-1.88.3-py3-none-any.whl.

File metadata

Download URL: policyengine_us_data-1.88.3-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 48.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for policyengine_us_data-1.88.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d382108e4c0004a0381fdd4183c1e9c2aaee412dc3c5719f840113fed58b51c`
MD5	`a71e4f36182797879822e1f36f8509cf`
BLAKE2b-256	`238c7c96ab58f3cc358663d3ddde981b5f6a20437d825922a4789eb809df57a4`

See more details on using hashes here.

policyengine-us-data 1.88.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PolicyEngine US Data

Installation

Pull Requests

SSA Data Sources

Pipeline Overview

Running the Full Pipeline

Building the Paper

Prerequisites

Building

Building the Documentation

Prerequisites

Building

TRACE provenance output

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes