Skip to main content

An open data processing pipeline for public US utility data.

Project description

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Build Status Documentation Status codecov Codacy Grade PyPI version

PUDL makes US energy data easier to access and work with. Hundreds of gigabytes of supposedly public information published by government agencies, but in a bunch of different formats that can be hard to work with and combine. PUDL takes these spreadsheets, CSV files, and databases and turns them into easy to parse, well-documented tabular data packages that can be used to create a database, used directly with Python, R, Microsoft Access, and lots of other tools.

The project currently contains data from:

We are especially interested in serving researchers, activists, journalists, and policy makers that might not otherwise be able to afford access to this data from commercial data providers.

Getting Started TEMPORARILY OUT OF DATE

Just want to play with some example data? Install Anaconda (or miniconda if you like the command line) with at least Python 3.7. Then run the following commands in your terminal:

NOTE: (2019-09-03) this next code block won’t work unless you have the old PostgreSQL PUDL database set up. We are in the process of deprecating that database, and using tabular datapackages that feed into SQLite instead. However, the code is temporarily out of sync with the docs. The last version of the guide to setting up the PostgreSQL database can be found in this commit if you need to get it set up in the interim.

$ git clone https://github.com/catalyst-cooperative/pudl.git
$ conda env create --name pudl --file pudl/environment.yml
$ conda activate pudl
$ pip install -e pudl
$ mkdir pudl-work
$ pudl_setup --pudl_in=pudl-work --pudl_out=pudl-work
$ pudl_data --sources eia923 eia860 ferc1 epacems epaipm --years 2017 --states id
$ pudl_etl pudl-work/settings/pudl_etl_example.yml
$ jupyter-lab --notebook-dir=pudl_workspace/notebooks

This will install the PUDL Python package, create some local directories inside a directory called pudl-work, download the most recent year of data from the public agencies, load it into a local PostgreSQL database, and open up a folder with some example Jupyter noteboooks in your web browser.

We are transitioning to generating CSV/JSON based tabular data packages, which are then loaded into a local SQLite database to make setting up PUDL easier.

NOTE: The example above requires a computer with at least 4 GB of RAM and several GB of free disk space. You will also need to download about 500 MB of data. This could take a while if you have a slow internet connection.

For more details, see the full PUDL documentation.

Contributing to PUDL

Find PUDL useful? Want to help make it better? There are lots of ways to contribute!

  • Please be sure to read our Code of Conduct

  • You can file a bug report, make a feature request, or ask questions in the Github issue tracker.

  • Feel free to fork the project and make a pull request with new code, better documentation, or example notebooks.

  • Make a financial contribution to support our work liberating public energy data.

  • Hire us to do some custom analysis, and let us add the code the project.

  • For more information check out our Contribution Guidelines

Licensing

The PUDL software is released under the MIT License. The PUDL documentation and the data packages we distribute are released under the Creative Commons Attribution 4.0 License.

Contact Us

For help with initial setup, usage questions, bug reports, suggestions to make PUDL better and anything else that could conceivably be of use or interest to the broader community of users, use the PUDL issue tracker. on Github. For private communication about the project, you can email the team: pudl@catalyst.coop

About Catalyst Cooperative

Catalyst Cooperative is a small group of data scientists and policy wonks. We’re organized as a worker-owned cooperative consultancy. Our goal is a more just, livable, and sustainable world. We integrate public data and perform custom analyses to inform public policy making. Our focus is primarily on mitigating climate change and improving electric utility regulation in the United States.

Do you work on renewable energy or climate policy? Have you found yourself scraping data from government PDFs, spreadsheets, websites, and databases, without getting something reusable? We build tools to pull this kind of information together reliably and automatically so you can focus on your real work instead — whether that’s political advocacy, energy journalism, academic research, or public policy making.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catalystcoop.pudl-0.1.0a3.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

catalystcoop.pudl-0.1.0a3-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file catalystcoop.pudl-0.1.0a3.tar.gz.

File metadata

  • Download URL: catalystcoop.pudl-0.1.0a3.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.3

File hashes

Hashes for catalystcoop.pudl-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 8b90fe49a3ed4915545c098ddc6fe09dcdf2d567b99bb8cf007a7f22ae70ae60
MD5 ec30b69034463720a70207de3af39763
BLAKE2b-256 9429af511cdc3b8d3f95dc07486cad12321f0d267f8fba5771533b7871463f62

See more details on using hashes here.

File details

Details for the file catalystcoop.pudl-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: catalystcoop.pudl-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.3

File hashes

Hashes for catalystcoop.pudl-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 b070a41eff63fb5ce878c46ea8deb7694797a47b6fca9cae2b38d758f70c38c0
MD5 e1bd87bdd85ff28067a532aefc512fa7
BLAKE2b-256 876fe7aa09464943281155f0c15370a22ced1487148f6e2ff00387d2df2512ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page