A tool for extracting data from FERC XBRL Filings.
Project description
The Federal Energy Regulatory Commission (FERC) has moved to collecting and distributing data using XBRL. XBRL is primarily designed for financial reporting, and has been adopted by regulators in the US and other countries. Much of the tooling in the XBRL ecosystem is targeted towards filers, and rendering individual filings in a human readable way, but there is very little targeted towards accessing and analyzing large collections of filings.
The FERC XBRL Extractor is designed to provide that functionality for FERC XBRL data. The library can extract data from a set of XBRL filings, and write that data to SQLite or DuckDB databases whose structure is derived from an XBRL Taxonomy. While each XBRL instance contains a reference to a taxonomy, this tool requires a path to a single taxonomy that will be used to interpret all instances being processed. This means even if instances were created from different versions of a taxonomy, the provided taxonomy will be used when processing all of these instances, so the output database will have a consistent structure. For more information on the technical details of the XBRL extraction, see the docs.
Catalyst Cooperative is currently using this tool to extract and publish the following FERC data. These outputs are updatded at least annually, and typically quarterly.
FERC Form |
Taxonomy |
Raw Data |
SQLite |
DuckDB |
|---|---|---|---|---|
Usage
Installation
The package can be installed from PyPI or conda-forge using your package manager of choice:
From PyPI
pip install catalystcoop.ferc-xbrl-extractor
uv pip install catalystcoop.ferc-xbrl-extractor
From conda-forge
conda install catalystcoop.ferc_xbrl_extractor
mamba install catalystcoop.ferc_xbrl_extractor
pixi install catalystcoop.ferc_xbrl_extractor
Input Data
The FERC XBRL Extractor is generally intended to consume raw XBRL filings and taxonomy information from one of the archives Catalyst Cooperative has published on Zenodo. Each supported form has its own archive lineage, with new snapshots captured from FERC’s XBRL filing RSS feeds on a regular basis (see links in the table above). The tool also expects to receive a zipfile containing archived taxonomies.
The archived filings and taxonomies are both produced using the pudl-archiver. The extractor will parse all taxonomies in the archive, then use the taxonomy referenced in each filing while parsing it.
CLI
This tool can be used as a library, as it is in PUDL. There is also a CLI provided for interacting with XBRL data. The only required options for the CLI are a path to the filings to be extracted, and a path to the output database. The path to the filings can point to a directory full of XBRL Filings, a single XBRL filing, or a zipfile with XBRL filings. If the specified output database already exists, it will be overwritten.
xbrl_extract {path_to_filings} --sqlite-path {path_to_database}
This repo contains a small selection of FERC Form 1 filings from 2021, along with an archive of taxonomies in the examples directory. To test the tool on these filings, use the command:
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path ./ferc1-2021-sample.sqlite \
--taxonomy examples/ferc1-xbrl-taxonomies.zip
Parsing XBRL filings can be a time consuming and CPU heavy task, so this tool implements some basic multiprocessing to speed this up. It uses a process pool to do this. There are two options for configuring the process pool, --batch-size and --workers. The batch size configures how many filings will be processed by each child process at a time, and workers specifies how many child processes to create in the pool. It may take some experimentation to get these options optimally configured. The following command will use 5 worker processes to process batches of 50 filings at a time. It will also output both SQLite and DuckDB.
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path ferc1-2021-sample.sqlite \
--duckdb-path ferc1-2021-sample.duckdb \
--taxonomy examples/ferc1-xbrl-taxonomies.zip \
--workers 5 \
--batch-size 50
There are also several options included for extracting metadata from the taxonomy. First is the --datapackage-path command to save a frictionless datapackage descriptor as JSON, which annotates the generated SQLite database. There is also the --metadata-path option, which writes more extensive taxonomy metadata to a json file, grouped by table name. See the ferc_xbrl_extractor.arelle_interface module for more info on the extracted metadata. To create both of these files using the example filings and taxonomy, run the following command.
xbrl_extract examples/ferc1-2021-sample.zip \
--sqlite-path /ferc1-2021-sample.sqlite \
--taxonomy examples/ferc1-xbrl-taxonomies.zip \
--metadata-path metadata.json \
--datapackage-path datapackage.json
Contributing / Development
This project uses uv for dependency management and Hatch for environment and task management. It also includes several git pre-commit hooks that help enforce standard coding practices. To set up the environment for development first ensure you have uv installed and then:
# Clone the repository to your local machine
git clone https://github.com/catalyst-cooperative/ferc-xbrl-extractor.git
cd ferc-xbrl-extractor
# Create the development environment with hatch
uv tool install hatch
hatch env create
# Install the pre-commit hooks
hatch run pre-commit install
All available development environments and commands can be shown with:
hatch env show
Some of the available commands:
# Run all tests and collect coverage
hatch run test:all
# Run only unit tests
hatch run test:unit
# Run only integration tests
hatch run test:integration
# Run linters and formatters
hatch run lint:all
# Check code without modifying
hatch run lint:check
# Format code
hatch run lint:format
# Build documentation
hatch run docs:build
# Check documentation formatting
hatch run docs:check
Code style is enforced using ruff with configuration in pyproject.toml.
PUDL Sustainers
This package is part of the Public Utility Data Liberation (PUDL) project.
The PUDL Sustainers provide ongoing financial support to ensure the open data keeps flowing, and the project is sustainable long term. They’re also involved in our quarterly planning process. To learn more see the PUDL Project on Open Collective.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz.
File metadata
- Download URL: catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz
- Upload date:
- Size: 30.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4af312476eb7d4188fbef43c2c18d892db09e6b27f4aef5bff072516ed0ce871
|
|
| MD5 |
4134855199db83d37f0447aec4a59fa7
|
|
| BLAKE2b-256 |
17cf0e3b94fe4956f9f2ea85d4e3ec790559b15d7222da10bda67e87b655257d
|
Provenance
The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz:
Publisher:
release.yml on catalyst-cooperative/ferc-xbrl-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catalystcoop_ferc_xbrl_extractor-1.8.0.tar.gz -
Subject digest:
4af312476eb7d4188fbef43c2c18d892db09e6b27f4aef5bff072516ed0ce871 - Sigstore transparency entry: 780952032
- Sigstore integration time:
-
Permalink:
catalyst-cooperative/ferc-xbrl-extractor@6068ef903277a55cf4b3889937d668736c8d11a9 -
Branch / Tag:
refs/tags/v1.8.0 - Owner: https://github.com/catalyst-cooperative
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6068ef903277a55cf4b3889937d668736c8d11a9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl.
File metadata
- Download URL: catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e3492b794717c3347c55a059cef2b6c38f7dcf41ed97c2f6e7a482dd5236512
|
|
| MD5 |
d44e4af34f7dc8f7175fff39a7799658
|
|
| BLAKE2b-256 |
8fde56175476ab9192c7833263da8b20ea5c6cc88f4519c9245618760aaa6001
|
Provenance
The following attestation bundles were made for catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl:
Publisher:
release.yml on catalyst-cooperative/ferc-xbrl-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
catalystcoop_ferc_xbrl_extractor-1.8.0-py3-none-any.whl -
Subject digest:
3e3492b794717c3347c55a059cef2b6c38f7dcf41ed97c2f6e7a482dd5236512 - Sigstore transparency entry: 780952038
- Sigstore integration time:
-
Permalink:
catalyst-cooperative/ferc-xbrl-extractor@6068ef903277a55cf4b3889937d668736c8d11a9 -
Branch / Tag:
refs/tags/v1.8.0 - Owner: https://github.com/catalyst-cooperative
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6068ef903277a55cf4b3889937d668736c8d11a9 -
Trigger Event:
push
-
Statement type: