Skip to main content

A tool for extracting data from FERC XBRL Filings.

Project description

Project Status: Active pytest status Codecov Test Coverage Read the Docs Build Status PyPI Latest Version conda-forge Version Supported Python Versions pre-commit CI https://zenodo.org/badge/471019769.svg

The Federal Energy Regulatory Commission (FERC) has moved to collecting and distributing data using XBRL. XBRL is primarily designed for financial reporting, and has been adopted by regulators in the US and other countries. Much of the tooling in the XBRL ecosystem is targeted towards filers, and rendering individual filings in a human readable way, but there is very little targeted towards accessing and analyzing large collections of filings.

The FERC XBRL Extractor is designed to provide that functionality for FERC XBRL data. The library can extract data from a set of XBRL filings, and write that data to SQLite or DuckDB databases whose structure is derived from an XBRL Taxonomy. While each XBRL instance contains a reference to a taxonomy, this tool requires a path to a single taxonomy that will be used to interpret all instances being processed. This means even if instances were created from different versions of a taxonomy, the provided taxonomy will be used when processing all of these instances, so the output database will have a consistent structure. For more information on the technical details of the XBRL extraction, see the docs.

Catalyst Cooperative is currently using this tool to extract and publish the following FERC data. These outputs are updatded at least annually, and typically quarterly.

FERC Form

Taxonomy

Raw Data

SQLite

DuckDB

Form 1 (Electricity)

Browse

10.5281/zenodo.4127043

Download

Download

Form 2 (Natural Gas)

Browse

10.5281/zenodo.5879542

Download

Download

Form 6 (Oil)

Browse

10.5281/zenodo.7126395

Download

Download

Form 60 (Service Companies)

Browse

10.5281/zenodo.7126434

Download

Download

Form 714 (Balancing Authorities)

Browse

10.5281/zenodo.4127100

Download

Download

Usage

Installation

The package can be installed from PyPI or conda-forge using your package manager of choice:

From PyPI

pip install catalystcoop.ferc-xbrl-extractor
uv pip install catalystcoop.ferc-xbrl-extractor

From conda-forge

conda install catalystcoop.ferc_xbrl_extractor
mamba install catalystcoop.ferc_xbrl_extractor
pixi install catalystcoop.ferc_xbrl_extractor

Input Data

The FERC XBRL Extractor is generally intended to consume raw XBRL filings and taxonomy information from one of the archives Catalyst Cooperative has published on Zenodo. Each supported form has its own archive lineage, with new snapshots captured from FERC’s XBRL filing RSS feeds on a regular basis (see links in the table above). The tool also expects to receive a zipfile containing archived taxonomies.

The archived filings and taxonomies are both produced using the pudl-archiver. The extractor will parse all taxonomies in the archive, then use the taxonomy referenced in each filing while parsing it.

CLI

This tool can be used as a library, as it is in PUDL. There is also a CLI provided for interacting with XBRL data. The only required options for the CLI are a path to the filings to be extracted, and a path to the output database. The path to the filings can point to a directory full of XBRL Filings, a single XBRL filing, or a zipfile with XBRL filings. If the specified output database already exists, it will be overwritten.

xbrl_extract {path_to_filings} --sqlite-path {path_to_database}

This repo contains a small selection of FERC Form 1 filings from 2021, along with an archive of taxonomies in the examples directory. To test the tool on these filings, use the command:

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ./ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip

Parsing XBRL filings can be a time consuming and CPU heavy task, so this tool implements some basic multiprocessing to speed this up. It uses a process pool to do this. There are two options for configuring the process pool, --batch-size and --workers. The batch size configures how many filings will be processed by each child process at a time, and workers specifies how many child processes to create in the pool. It may take some experimentation to get these options optimally configured. The following command will use 5 worker processes to process batches of 50 filings at a time. It will also output both SQLite and DuckDB.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ferc1-2021-sample.sqlite \
    --duckdb-path ferc1-2021-sample.duckdb \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --workers 5 \
    --batch-size 50

You can also pass the --metadata-path option, which writes extensive taxonomy metadata to a json file, grouped by table name. See the ferc_xbrl_extractor.arelle_interface module for more info on the extracted metadata.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path /ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --metadata-path metadata.json

Contributing / Development

This project uses uv for dependency management and Hatch for environment and task management. It also includes several git pre-commit hooks that help enforce standard coding practices. To set up the environment for development first ensure you have uv installed and then:

# Clone the repository to your local machine
git clone https://github.com/catalyst-cooperative/ferc-xbrl-extractor.git
cd ferc-xbrl-extractor
# Create the development environment with hatch
uv tool install hatch
hatch env create
# Install the pre-commit hooks
hatch run pre-commit install

All available development environments and commands can be shown with:

hatch env show

Some of the available commands:

# Run all tests and collect coverage
hatch run test:all
# Run only unit tests
hatch run test:unit
# Run only integration tests
hatch run test:integration
# Run linters and formatters
hatch run lint:all
# Check code without modifying
hatch run lint:check
# Format code
hatch run lint:format
# Build documentation
hatch run docs:build
# Check documentation formatting
hatch run docs:check

Code style is enforced using ruff with configuration in pyproject.toml.

PUDL Sustainers

This package is part of the Public Utility Data Liberation (PUDL) project.

The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They’re also involved in our quarterly planning process. To learn more see the PUDL Project on Open Collective.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catalystcoop_ferc_xbrl_extractor-1.10.0.tar.gz (30.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file catalystcoop_ferc_xbrl_extractor-1.10.0.tar.gz.

File metadata

File hashes

Hashes for catalystcoop_ferc_xbrl_extractor-1.10.0.tar.gz
Algorithm Hash digest
SHA256 01eac20a18cb1e22ec12066e38a93f9e379d524d10568fd81a0306195ceff386
MD5 7e14f2fc542079f4a3d1432c6fcd7e9e
BLAKE2b-256 d1003e6de2978df747657710be2dee3e62ad941e82e5114b662cb0fde8d31353

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.10.0.tar.gz:

Publisher: release.yml on catalyst-cooperative/ferc-xbrl-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file catalystcoop_ferc_xbrl_extractor-1.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for catalystcoop_ferc_xbrl_extractor-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd7519dbd9b05013fd652282a7457a8ec902ae1b2fafd2dfb95d852e410930fc
MD5 8291decf64b62d930ccb1aae56238c43
BLAKE2b-256 31d6ad8cebc25ca1ac2e880326977bfd9624c9b4fc11a42fe2a913420a4ce206

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.10.0-py3-none-any.whl:

Publisher: release.yml on catalyst-cooperative/ferc-xbrl-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page