Skip to main content

A tool for extracting data from FERC XBRL Filings.

Project description

Project Status: Active pytest status Codecov Test Coverage Read the Docs Build Status PyPI Latest Version conda-forge Version Supported Python Versions pre-commit CI https://zenodo.org/badge/471019769.svg

The Federal Energy Regulatory Commission (FERC) has moved to collecting and distributing data using XBRL. XBRL is primarily designed for financial reporting, and has been adopted by regulators in the US and other countries. Much of the tooling in the XBRL ecosystem is targeted towards filers, and rendering individual filings in a human readable way, but there is very little targeted towards accessing and analyzing large collections of filings.

The FERC XBRL Extractor is designed to provide that functionality for FERC XBRL data. The library can extract data from a set of XBRL filings, and write that data to SQLite or DuckDB databases whose structure is derived from an XBRL Taxonomy. While each XBRL instance contains a reference to a taxonomy, this tool requires a path to a single taxonomy that will be used to interpret all instances being processed. This means even if instances were created from different versions of a taxonomy, the provided taxonomy will be used when processing all of these instances, so the output database will have a consistent structure. For more information on the technical details of the XBRL extraction, see the docs.

Catalyst Cooperative is currently using this tool to extract and publish the following FERC data. These outputs are updatded at least annually, and typically quarterly.

FERC Form

Taxonomy

Raw Data

SQLite

DuckDB

Form 1 (Electricity)

Browse

10.5281/zenodo.4127043

Download

Download

Form 2 (Natural Gas)

Browse

10.5281/zenodo.5879542

Download

Download

Form 6 (Oil)

Browse

10.5281/zenodo.7126395

Download

Download

Form 60 (Service Companies)

Browse

10.5281/zenodo.7126434

Download

Download

Form 714 (Balancing Authorities)

Browse

10.5281/zenodo.4127100

Download

Download

Usage

Installation

The package can be installed from PyPI or conda-forge using your package manager of choice:

From PyPI

pip install catalystcoop.ferc-xbrl-extractor
uv pip install catalystcoop.ferc-xbrl-extractor

From conda-forge

conda install catalystcoop.ferc_xbrl_extractor
mamba install catalystcoop.ferc_xbrl_extractor
pixi install catalystcoop.ferc_xbrl_extractor

Input Data

The FERC XBRL Extractor is generally intended to consume raw XBRL filings and taxonomy information from one of the archives Catalyst Cooperative has published on Zenodo. Each supported form has its own archive lineage, with new snapshots captured from FERC’s XBRL filing RSS feeds on a regular basis (see links in the table above). The tool also expects to receive a zipfile containing archived taxonomies.

The archived filings and taxonomies are both produced using the pudl-archiver. The extractor will parse all taxonomies in the archive, then use the taxonomy referenced in each filing while parsing it.

CLI

This tool can be used as a library, as it is in PUDL. There is also a CLI provided for interacting with XBRL data. The only required options for the CLI are a path to the filings to be extracted, and a path to the output database. The path to the filings can point to a directory full of XBRL Filings, a single XBRL filing, or a zipfile with XBRL filings. If the specified output database already exists, it will be overwritten.

xbrl_extract {path_to_filings} --sqlite-path {path_to_database}

This repo contains a small selection of FERC Form 1 filings from 2021, along with an archive of taxonomies in the examples directory. To test the tool on these filings, use the command:

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ./ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip

Parsing XBRL filings can be a time consuming and CPU heavy task, so this tool implements some basic multiprocessing to speed this up. It uses a process pool to do this. There are two options for configuring the process pool, --batch-size and --workers. The batch size configures how many filings will be processed by each child process at a time, and workers specifies how many child processes to create in the pool. It may take some experimentation to get these options optimally configured. The following command will use 5 worker processes to process batches of 50 filings at a time. It will also output both SQLite and DuckDB.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path ferc1-2021-sample.sqlite \
    --duckdb-path ferc1-2021-sample.duckdb \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --workers 5 \
    --batch-size 50

There are also several options included for extracting metadata from the taxonomy. First is the --datapackage-path command to save a frictionless datapackage descriptor as JSON, which annotates the generated SQLite database. There is also the --metadata-path option, which writes more extensive taxonomy metadata to a json file, grouped by table name. See the ferc_xbrl_extractor.arelle_interface module for more info on the extracted metadata. To create both of these files using the example filings and taxonomy, run the following command.

xbrl_extract examples/ferc1-2021-sample.zip \
    --sqlite-path /ferc1-2021-sample.sqlite \
    --taxonomy examples/ferc1-xbrl-taxonomies.zip \
    --metadata-path metadata.json \
    --datapackage-path datapackage.json

Contributing / Development

This project uses uv for dependency management and Hatch for environment and task management. It also includes several git pre-commit hooks that help enforce standard coding practices. To set up the environment for development first ensure you have uv installed and then:

# Clone the repository to your local machine
git clone https://github.com/catalyst-cooperative/ferc-xbrl-extractor.git
cd ferc-xbrl-extractor
# Create the development environment with hatch
uv tool install hatch
hatch env create
# Install the pre-commit hooks
hatch run pre-commit install

All available development environments and commands can be shown with:

hatch env show

Some of the available commands:

# Run all tests and collect coverage
hatch run test:all
# Run only unit tests
hatch run test:unit
# Run only integration tests
hatch run test:integration
# Run linters and formatters
hatch run lint:all
# Check code without modifying
hatch run lint:check
# Format code
hatch run lint:format
# Build documentation
hatch run docs:build
# Check documentation formatting
hatch run docs:check

Code style is enforced using ruff with configuration in pyproject.toml.

PUDL Sustainers

This package is part of the Public Utility Data Liberation (PUDL) project.

The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They’re also involved in our quarterly planning process. To learn more see the PUDL Project on Open Collective.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz (30.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz.

File metadata

File hashes

Hashes for catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz
Algorithm Hash digest
SHA256 4af312476eb7d4188fbef43c2c18d892db09e6b27f4aef5bff072516ed0ce871
MD5 4134855199db83d37f0447aec4a59fa7
BLAKE2b-256 17cf0e3b94fe4956f9f2ea85d4e3ec790559b15d7222da10bda67e87b655257d

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz:

Publisher: release.yml on catalyst-cooperative/ferc-xbrl-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e3492b794717c3347c55a059cef2b6c38f7dcf41ed97c2f6e7a482dd5236512
MD5 d44e4af34f7dc8f7175fff39a7799658
BLAKE2b-256 8fde56175476ab9192c7833263da8b20ea5c6cc88f4519c9245618760aaa6001

See more details on using hashes here.

Provenance

The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl:

Publisher: release.yml on catalyst-cooperative/ferc-xbrl-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page