Skip to main content

Lean (non-XML) approach to process XBRL

Project description

LeanRL

Lean (non-XML) approach to process XBRL

Python 3.8+ License: MIT

A lightweight, memory-efficient, and fast Python library for extracting specific information from XBRL filings and taxonomies — without loading the entire DTS (Discovery Tree).

Funding Acknowledgment (DFG): Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Collaborative Research Center (SFB/TRR) Project-ID 403041268 – TRR 266 Accounting for Transparency.

Motivation

XBRL is powerful but complex:

  • A single filing includes the instance document, company linkbases, and a huge taxonomy (hundreds of XML files)
  • Traditional XBRL processors load the full DTS into memory: slow and memory-intensive
  • In many real-world scenarios (data extraction, analysis, reporting), you only need a small subset of the data

LeanRL takes a pragmatic, non-strict approach:

  • Process one file at a time (no full DTS loading)
  • Extract only what you need into simple Python structures (dict, list, pandas.DataFrame)
  • Forget strict XBRL validation and complex object models — focus on speed and simplicity

Features (Planned)

  • Parse presentation linkbases (build hierarchical trees, tables, roll-forwards)
  • Parse calculation linkbases (extract summation rules)
  • Parse definition linkbases (dimensions, tables, axes); See: Documentation
  • Parse label linkbases (English/translated labels)
  • Parse taxonomy schema files (elements, types, from elts/, dis/, stm/)
  • Convert XBRL structures to pandas DataFrames or nested dictionaries
  • Support for both company filings and raw US GAAP/IFRS taxonomies

Install

To install released version, run: pip install leanrl

To install the latest development version from this github repo, run:

git clone https://github.com/reeyarn/LeanRL/
cd LeanRL
pip install -e .

or uv pip install -e ".[dev]"

Example

from leanrl import parse_label_linkbase, Roles

# Get documentation
path = "/tmp/us-gaap-2020-01-31/elts/"
#path = "LeanRL/tests/data/"

filename = "us-gaap-doc-2020-01-31.xml"

docs = parse_label_linkbase(path + filename)

for i, (concept, doc) in enumerate(docs.items()):
    print(f"{i}: {concept}: {doc}")
    if i > 32:
        break


# Get display labels
labels = parse_label_linkbase(path + 'us-gaap-lab-2020-01-31.xml', role=Roles.LABEL)


for i, (concept, label) in enumerate(labels.items()):
    print(f"{i}: {concept}: {label}")
    if i > 32:
        break

Project Structure

leanrl/
├── src/leanrl/
│   ├── core/
│   │   ├── namespaces.py   # qname(), Roles, NS_LINK, etc.
│   │   ├── parser.py   
│   │   └── streaming.py    # stream_xml()
│   ├── utils/
│   │   └── href.py         # extract_concept_from_href()
|   linkbases/
│   │   ├── __init__.py
│   │   ├── label.py              # Label linkbase only
│   │   ├── reference.py          # Reference linkbase only
│   │   ├── calculation.py        # Calculation linkbase only
│   │   ├── hierarchy.py          # Shared ConceptNode, ConceptTree (used by def & pre)
│   │   ├── definition.py         # Definition linkbase only (imports from hierarchy)
│   │   └── presentation.py       # Presentation linkbase only (imports from hierarchy)└── tests/
├── tests/
    └── test1.py

Attribution & Legal Notices

ESEF Standard Acknowledgment

This project supports the European Single Electronic Format (ESEF), established by the European Securities and Markets Authority (ESMA) as the mandated digital reporting standard for annual financial reports of listed companies in the European Union. The ESEF specifications and guidelines are sourced from ESMA’s official publications and are adhered to in this implementation. For more information, visit esma.europa.eu.

IFRS Taxonomy & ESEF Standards

This project supports the processing of filings based on the International Financial Reporting Standards (IFRS) and the European Single Electronic Format (ESEF).

IFRS Taxonomy The IFRS Taxonomy is developed and maintained by the IFRS Foundation. The taxonomy files included or referenced in this project are sourced from the IFRS Foundation’s official repository.

  • Copyright: The IFRS Taxonomy is Copyright © IFRS Foundation. All rights reserved.
  • Disclaimer: This project is an open-source tool and is not affiliated with, endorsed by, or commercially licensed by the IFRS Foundation. The files are used solely to facilitate the technical validation and creation of XBRL/iXBRL documents. For official standards, please visit ifrs.org.

ESEF Guidelines The ESEF reporting standard is established by the European Securities and Markets Authority (ESMA) for listed companies in the European Union.

  • Source: ESEF specifications are sourced from ESMA’s official publications.
  • Attribution: Adherence to ESEF guidelines in this project is based on public technical standards available at esma.europa.eu.

US GAAP Taxonomy Acknowledgment & License

This project includes copies of the US GAAP Financial Reporting Taxonomy (e.g., us-gaap-YYYY-MM-DD.xsd), sourced from official locations (e.g., fasb.org and xbrl.us). These files are Copyright © Financial Accounting Foundation (FAF) and, for certain prior versions, XBRL US, Inc.

The taxonomy files are redistributed within this project as a "Permitted Work" pursuant to the FAF's Copyright Notice and policies. They are provided for public use to assist in the implementation and processing of XBRL data.

Compliance Conditions:

  1. Non-Modification: All original copyright notices, XML comments, disclaimers, and license statements embedded in the taxonomy files have been preserved unchanged.
  2. No Ownership Claim: This project does not claim ownership of the taxonomy; rights remain exclusively with the FAF and XBRL US.
  3. Authorized Use: Use of these files is subject to the Notice of Authorized Uses maintained by the FAF.

For full license terms, please see the Official Terms and Conditions.

General Disclaimer & Takedown Notice

The use of the standards, taxonomies, and schemas listed above is intended to support educational and research purposes in alignment with the open-source goals of this project.

Rights Infringement Contact: If any use herein is found to infringe upon the rights of the FASB, XBRL US, ESMA, or the IFRS Foundation, please contact the author immediately:

Contact: reeyarn+github.openesef@gmail.com

Upon receipt of a valid notice, the author will promptly remove or adjust the offending content to address any concerns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leanrl-0.1.5.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leanrl-0.1.5-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file leanrl-0.1.5.tar.gz.

File metadata

  • Download URL: leanrl-0.1.5.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for leanrl-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c92fa275daaf9a9155d7e5abe92ff96668709aaa0add8d2acf011fa2a0162a22
MD5 2cf28a20ff9f3e0b71c6353c887d51f7
BLAKE2b-256 bd7c5b415d4550eeca88052f7295c39608ef08d8949920b7285a670744733046

See more details on using hashes here.

File details

Details for the file leanrl-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: leanrl-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for leanrl-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 873c9cf9ab3573db2a150da0f73a26542e51d17da12dacfe55eb7fe585d248f3
MD5 de196918cd51d9b8e84f8cb7296ad928
BLAKE2b-256 f63bec0b4ef5110d2d62ffbf538e1083fdb74d30cc638215ef19bc39489192a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page