Lean (non-XML) approach to process XBRL
Project description
LeanRL
Lean (non-XML) approach to process XBRL
A lightweight, memory-efficient, and fast Python library for extracting specific information from XBRL filings and taxonomies — without loading the entire DTS (Discovery Tree).
Funding Acknowledgment (DFG): Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Collaborative Research Center (SFB/TRR) Project-ID 403041268 – TRR 266 Accounting for Transparency.
Motivation
XBRL is powerful but complex:
- A single filing includes the instance document, company linkbases, and a huge taxonomy (hundreds of XML files)
- Traditional XBRL processors load the full DTS into memory: slow and memory-intensive
- In many real-world scenarios (data extraction, analysis, reporting), you only need a small subset of the data
LeanRL takes a pragmatic, non-strict approach:
- Process one file at a time (no full DTS loading)
- Extract only what you need into simple Python structures (
dict,list,pandas.DataFrame) - Forget strict XBRL validation and complex object models — focus on speed and simplicity
Features (Planned)
- Parse presentation linkbases (build hierarchical trees, tables, roll-forwards)
- Parse calculation linkbases (extract summation rules)
- Parse definition linkbases (dimensions, tables, axes); See: Documentation
- Parse label linkbases (English/translated labels)
- Parse taxonomy schema files (elements, types, from
elts/,dis/,stm/) - Convert XBRL structures to pandas DataFrames or nested dictionaries
- Support for both company filings and raw US GAAP/IFRS taxonomies
Install
To install released version, run:
pip install leanrl
To install the latest development version from this github repo, run:
git clone https://github.com/reeyarn/LeanRL/
cd LeanRL
pip install -e .
or uv pip install -e ".[dev]"
Example
from leanrl import parse_label_linkbase, Roles
# Get documentation
path = "/tmp/us-gaap-2020-01-31/elts/"
#path = "LeanRL/tests/data/"
filename = "us-gaap-doc-2020-01-31.xml"
docs = parse_label_linkbase(path + filename)
for i, (concept, doc) in enumerate(docs.items()):
print(f"{i}: {concept}: {doc}")
if i > 32:
break
# Get display labels
labels = parse_label_linkbase(path + 'us-gaap-lab-2020-01-31.xml', role=Roles.LABEL)
for i, (concept, label) in enumerate(labels.items()):
print(f"{i}: {concept}: {label}")
if i > 32:
break
Project Structure
leanrl/
├── src/leanrl/
│ ├── core/
│ │ ├── namespaces.py # qname(), Roles, NS_LINK, etc.
│ │ ├── parser.py
│ │ └── streaming.py # stream_xml()
│ ├── utils/
│ │ └── href.py # extract_concept_from_href()
| linkbases/
│ │ ├── __init__.py
│ │ ├── label.py # Label linkbase only
│ │ ├── reference.py # Reference linkbase only
│ │ ├── calculation.py # Calculation linkbase only
│ │ ├── hierarchy.py # Shared ConceptNode, ConceptTree (used by def & pre)
│ │ ├── definition.py # Definition linkbase only (imports from hierarchy)
│ │ └── presentation.py # Presentation linkbase only (imports from hierarchy)└── tests/
├── tests/
└── test1.py
Attribution & Legal Notices
ESEF Standard Acknowledgment
This project supports the European Single Electronic Format (ESEF), established by the European Securities and Markets Authority (ESMA) as the mandated digital reporting standard for annual financial reports of listed companies in the European Union. The ESEF specifications and guidelines are sourced from ESMA’s official publications and are adhered to in this implementation. For more information, visit esma.europa.eu.
IFRS Taxonomy & ESEF Standards
This project supports the processing of filings based on the International Financial Reporting Standards (IFRS) and the European Single Electronic Format (ESEF).
IFRS Taxonomy The IFRS Taxonomy is developed and maintained by the IFRS Foundation. The taxonomy files included or referenced in this project are sourced from the IFRS Foundation’s official repository.
- Copyright: The IFRS Taxonomy is Copyright © IFRS Foundation. All rights reserved.
- Disclaimer: This project is an open-source tool and is not affiliated with, endorsed by, or commercially licensed by the IFRS Foundation. The files are used solely to facilitate the technical validation and creation of XBRL/iXBRL documents. For official standards, please visit ifrs.org.
ESEF Guidelines The ESEF reporting standard is established by the European Securities and Markets Authority (ESMA) for listed companies in the European Union.
- Source: ESEF specifications are sourced from ESMA’s official publications.
- Attribution: Adherence to ESEF guidelines in this project is based on public technical standards available at esma.europa.eu.
US GAAP Taxonomy Acknowledgment & License
This project includes copies of the US GAAP Financial Reporting Taxonomy (e.g., us-gaap-YYYY-MM-DD.xsd), sourced from official locations (e.g., fasb.org and xbrl.us). These files are Copyright © Financial Accounting Foundation (FAF) and, for certain prior versions, XBRL US, Inc.
The taxonomy files are redistributed within this project as a "Permitted Work" pursuant to the FAF's Copyright Notice and policies. They are provided for public use to assist in the implementation and processing of XBRL data.
Compliance Conditions:
- Non-Modification: All original copyright notices, XML comments, disclaimers, and license statements embedded in the taxonomy files have been preserved unchanged.
- No Ownership Claim: This project does not claim ownership of the taxonomy; rights remain exclusively with the FAF and XBRL US.
- Authorized Use: Use of these files is subject to the Notice of Authorized Uses maintained by the FAF.
For full license terms, please see the Official Terms and Conditions.
General Disclaimer & Takedown Notice
The use of the standards, taxonomies, and schemas listed above is intended to support educational and research purposes in alignment with the open-source goals of this project.
Rights Infringement Contact: If any use herein is found to infringe upon the rights of the FASB, XBRL US, ESMA, or the IFRS Foundation, please contact the author immediately:
Contact: reeyarn+github.openesef@gmail.com
Upon receipt of a valid notice, the author will promptly remove or adjust the offending content to address any concerns.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leanrl-0.1.5.tar.gz.
File metadata
- Download URL: leanrl-0.1.5.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c92fa275daaf9a9155d7e5abe92ff96668709aaa0add8d2acf011fa2a0162a22
|
|
| MD5 |
2cf28a20ff9f3e0b71c6353c887d51f7
|
|
| BLAKE2b-256 |
bd7c5b415d4550eeca88052f7295c39608ef08d8949920b7285a670744733046
|
File details
Details for the file leanrl-0.1.5-py3-none-any.whl.
File metadata
- Download URL: leanrl-0.1.5-py3-none-any.whl
- Upload date:
- Size: 30.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873c9cf9ab3573db2a150da0f73a26542e51d17da12dacfe55eb7fe585d248f3
|
|
| MD5 |
de196918cd51d9b8e84f8cb7296ad928
|
|
| BLAKE2b-256 |
f63bec0b4ef5110d2d62ffbf538e1083fdb74d30cc638215ef19bc39489192a5
|