Skip to main content

An open data processing pipeline for US energy data

Project description

Project Status: Active PyTest Status Codecov Test Coverage Read the Docs Build Status ruff pre-commit CI Zenodo DOI Schedule a 1-on-1 chat with us about PUDL. Follow Catalyst Cooperative on Mastodon Follow Catalyst Cooperative on LinkedIn Follow @catalyst.coop on BlueSky The PUDL Dataset on Kaggle slack Catalyst Cooperative on YouTube Catalyst Cooperative on Twitter PUDL in the AWS Open Data Registry

What is PUDL?

The PUDL Project (pronounced puddle) is an open source data processing pipeline that makes US energy data easier to access and use programmatically.

Hundreds of gigabytes of valuable data are published by US government agencies, but it’s often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.

The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch.

We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from a grassroots youth climate organizers working with Google sheets to university researchers with access to scalable cloud computing resources and everyone in between!

PUDL is comprised of three core components:

Raw Data Archives

PUDL archives all our raw inputs on Zenodo to ensure permanent, versioned access to the data. In the event that an agency changes how it publishes data or deletes old files, the data processing pipeline will still have access to the original inputs. Each of the data inputs may have several different versions archived, and all are assigned a unique DOI (digital object identifier) and made available through Zenodo’s REST API. You can read more about the Raw Data Archives in the docs.

Data Pipeline

The data pipeline (this repo) ingests raw data from the archives, cleans and integrates it, and writes the resulting tables to SQLite and Apache Parquet files, with some accompanying metadata stored as JSON. Each release of the PUDL software contains a set of DOIs indicating which versions of the raw inputs it processes. This helps ensure that the outputs are replicable. You can read more about our ETL (extract, transform, load) process in the PUDL documentation.

Data Warehouse

The SQLite, Parquet, and JSON outputs from the data pipeline, sometimes called “PUDL outputs”, are updated each night by an automated build process, and periodically archived so that users can access the data without having to install and run our data processing system. These outputs contain hundreds of tables and comprise a small file-based data warehouse that can be used for a variety of energy system analyses. Learn more about how to access the PUDL data.

What data is available?

PUDL currently integrates data from:

High Priority Target Datasets

If you’re interested in any of these datasets, we’d love to integrate them into PUDL. Get in touch!

How do I access the data?

For details on how to access PUDL data, see the data access documentation. A quick summary:

Organizations using PUDL

This is a partial list of organizations that have used PUDL in their work. If your organization uses PUDL we’d love to list you here! Please open a pull request or email us at hello@catalyst.coop!

Contributing to PUDL

Find PUDL useful? Want to help make it better? There are lots of ways to help!

PUDL Sustainers

The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They’re also involved in our quarterly planning process. To learn more see the PUDL Project page on Open Collective.

Gigawatt Tier (≥$25,000/year)

RMI GridLab

Megawatt Tier (≥$16,000/year)

Become our first Megawatt tier sustainer!

Kilowatt Tier (≥$8,000/year)

Become our first kilowatt tier sustainer!

Major Grant Funders

Alfred P. Sloan Foundation

Alfred P. Sloan Foundation Energy and Environment Program

The PUDL Project has been supported by three grants from the Alfred P. Sloan Foundation’s Energy and Environment Program, in 2019, 2021, and 2024.

National Science Foundation

National Science Foundation Pathways to Enable Open Source Ecosystems (POSE)

The PUDL Project was awarded a grant from the National Science Foundation’s Pathways to Enable Open Source Ecosystems (POSE) program (award 2346139) in 2024.

Licensing

In general, our code, data, and other work are permissively licensed for use by anybody, for any purpose, so long as you give us credit for the work we’ve done.

Contact Us

About Catalyst Cooperative

Catalyst Cooperative is a small group of data wranglers and policy wonks organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy (Hire us!). Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catalystcoop_pudl-2025.12.1.tar.gz (57.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

catalystcoop_pudl-2025.12.1-py3-none-any.whl (3.9 MB view details)

Uploaded Python 3

File details

Details for the file catalystcoop_pudl-2025.12.1.tar.gz.

File metadata

  • Download URL: catalystcoop_pudl-2025.12.1.tar.gz
  • Upload date:
  • Size: 57.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for catalystcoop_pudl-2025.12.1.tar.gz
Algorithm Hash digest
SHA256 0c5715114ab03b0e4f64a534b953accfa00aabb66ad396b5b22cff6ef6c061d8
MD5 adb867a64904e85171f0e8193c8c2886
BLAKE2b-256 d2705f67edb24c4c2b2d876961d42ab41aacb3441daede44f29a4d504bb8c6d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_pudl-2025.12.1.tar.gz:

Publisher: release.yml on catalyst-cooperative/pudl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file catalystcoop_pudl-2025.12.1-py3-none-any.whl.

File metadata

File hashes

Hashes for catalystcoop_pudl-2025.12.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f751284c1938bbb937934e947b6a3a3ef543e59c7f519a2b3498d88ca1c463d
MD5 e9bbe49f4ff80f0f57b4e47228b15fdb
BLAKE2b-256 f003e96df9c2aead28219ddbe7dc3c4e7bba2cfc4492b4e6e46aa615abf000e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_pudl-2025.12.1-py3-none-any.whl:

Publisher: release.yml on catalyst-cooperative/pudl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page