Skip to main content

Lightweight XBRL 2.1 / iXBRL 1.1 parser and structured data extraction library

Project description

xbrl-core — Lightweight XBRL 2.1 / iXBRL 1.1 Parser for Python

PyPI version Python Context7 Indexed Context7 llms.txt

xbrl-core is a pure-Python parser and structured data extraction library for XBRL 2.1 instance documents and iXBRL (Inline XBRL) documents. It supports fact extraction, context/unit structuring, all five linkbase types (presentation, calculation, definition, label, reference), XSD schema parsing, calculation validation, text block extraction, pandas/DataFrame conversion, and Rich/HTML rendering. The only required dependency is lxml.

GitHub Repository

Installation

pip install xbrl-core

Optional dependencies:

# pandas + pyarrow (DataFrame conversion, Parquet export)
pip install 'xbrl-core[analysis]'

# Rich terminal display
pip install 'xbrl-core[display]'

# Excel export (pandas + openpyxl)
pip install 'xbrl-core[excel]'

# Everything
pip install 'xbrl-core[all]'

Quick Start

from xbrl_core import parse_xbrl_facts, structure_contexts, build_line_items

# 1. Parse an XBRL instance document
with open("instance.xbrl", "rb") as f:
    parsed = parse_xbrl_facts(f.read(), source_path="instance.xbrl")

print(f"Facts: {parsed.fact_count}")

# 2. Structure contexts and build typed LineItems
ctx_map = structure_contexts(parsed.contexts)
items = build_line_items(parsed.facts, ctx_map)

for item in items[:5]:
    print(item.local_name, item.value, item.period)

Parsing

XBRL Instance

parse_xbrl_facts() takes raw bytes and returns a ParsedXBRL containing facts, contexts, units, schema refs, footnote links, and ignored elements.

from xbrl_core import parse_xbrl_facts

parsed = parse_xbrl_facts(xbrl_bytes, source_path="example.xbrl")

# Extracted data
parsed.facts             # tuple[RawFact, ...]
parsed.contexts          # tuple[RawContext, ...]
parsed.units             # tuple[RawUnit, ...]
parsed.schema_refs       # tuple[RawSchemaRef, ...]
parsed.footnote_links    # tuple[RawFootnoteLink, ...]
parsed.ignored_elements  # tuple[IgnoredElement, ...]
parsed.fact_count        # int

iXBRL (Inline XBRL)

parse_ixbrl_facts() parses iXBRL (XHTML-embedded XBRL) documents. The output is the same ParsedXBRL type, so downstream pipelines work identically.

from xbrl_core import parse_ixbrl_facts

parsed = parse_ixbrl_facts(ixbrl_bytes, source_path="report.htm")

for fact in parsed.facts[:5]:
    print(fact.local_name, fact.value_raw)

iXBRL format attributes (ixt:numdotdecimal, ixt:numcommadecimal, etc.) and scale/sign attributes are automatically applied. Custom formats can be registered:

from xbrl_core import FormatRegistry, parse_ixbrl_facts

registry = FormatRegistry()
registry.register("dateyearmonthdaycjk", my_cjk_date_func)

parsed = parse_ixbrl_facts(ixbrl_bytes, format_registry=registry)

IXDS (Inline XBRL Document Set)

Multiple iXBRL files from a single filing can be merged:

from xbrl_core import parse_ixbrl_facts, merge_ixbrl_results

results = [parse_ixbrl_facts(f) for f in ixbrl_files]
merged = merge_ixbrl_results(results)

Strict / Lenient Mode

Both parsers accept a strict parameter. When strict=True (default), spec violations raise XbrlParseError. When strict=False, violations emit warnings and are recorded in ignored_elements.

parsed = parse_xbrl_facts(xbrl_bytes, strict=False)
for elem in parsed.ignored_elements:
    print(elem.reason, elem.source_line)

Context Structuring

structure_contexts() converts raw context XML fragments into typed StructuredContext objects with period, entity, and dimension information.

from xbrl_core import structure_contexts, ContextCollection

ctx_map = structure_contexts(parsed.contexts)

# Direct dict access
ctx = ctx_map["CurrentYearInstant"]
print(ctx.period)       # InstantPeriod(instant=datetime.date(2024, 3, 31))
print(ctx.entity_id)    # "E00001"
print(ctx.dimensions)   # tuple[DimensionMember, ...]

# ContextCollection for filtering
coll = ContextCollection(ctx_map)
coll.filter_instant()                   # instant contexts only
coll.filter_duration()                  # duration contexts only
coll.filter_no_dimensions()             # no dimension members
coll.filter_by_dimension(axis="{ns}ProductAxis", member="{ns}SegmentA")

coll.latest_instant_period              # most recent InstantPeriod
coll.unique_duration_periods            # unique DurationPeriods, sorted

Unit Structuring

structure_units() converts raw unit XML fragments into typed StructuredUnit objects.

from xbrl_core import structure_units

unit_map = structure_units(parsed.units)

unit = unit_map["JPY"]
print(unit.is_monetary)     # True
print(unit.currency_code)   # "JPY"

unit = unit_map["pure"]
print(unit.is_pure)         # True

unit = unit_map["JPYPerShare"]
print(unit.is_per_share)    # True

Building LineItems

build_line_items() merges RawFact + StructuredContext + optional LabelResolver into fully typed LineItem objects.

from xbrl_core import build_line_items

items = build_line_items(parsed.facts, ctx_map, langs=("en", "ja"))

for item in items:
    print(item.local_name)    # "NetSales"
    print(item.value)         # Decimal('1234567890')
    print(item.period)        # InstantPeriod / DurationPeriod
    print(item.entity_id)     # "E00001"
    print(item.dimensions)    # tuple[DimensionMember, ...]
    print(item.label("en"))   # "Net sales"
    print(item.label("ja"))   # "売上高"

Linkbase Parsing

Presentation Linkbase

from xbrl_core import parse_presentation_linkbase, merge_presentation_trees

trees = parse_presentation_linkbase(pre_xml_bytes)

for role_uri, tree in trees.items():
    # Flatten the tree (depth-first)
    for node in tree.flatten(skip_abstract=True, skip_dimension=True):
        print("  " * node.depth + node.concept)

    # Get only the line-items subtree
    for node in tree.line_items_roots():
        print(node.concept, node.order)

# Merge multiple presentation linkbases
merged = merge_presentation_trees(trees_a, trees_b)

Calculation Linkbase

from xbrl_core import parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)

for role_uri in calc_lb.role_uris:
    tree = calc_lb.get_tree(role_uri)
    for arc in tree.arcs:
        sign = "+" if arc.weight == 1 else "-"
        print(f"  {arc.parent} {sign}-> {arc.child}")

# Query relationships
calc_lb.children_of("GrossProfit")        # child arcs
calc_lb.parent_of("NetSales")             # parent arcs
calc_lb.ancestors_of("NetSales", role_uri=role)  # root-ward chain

Definition Linkbase

from xbrl_core import parse_definition_linkbase

def_trees = parse_definition_linkbase(def_xml_bytes)

for role_uri, tree in def_trees.items():
    for hc in tree.hypercubes:
        print(f"Table: {hc.table_concept}")
        for axis in hc.axes:
            print(f"  Axis: {axis.axis_concept}")
            if axis.domain:
                print(f"  Domain: {axis.domain.concept}")

Label Linkbase

from xbrl_core import parse_label_linkbase

labels = parse_label_linkbase(lab_xml_bytes)

for lab in labels:
    print(f"{lab.concept_name} [{lab.lang}] = {lab.text}")

Reference Linkbase

from xbrl_core import parse_reference_linkbase

refs = parse_reference_linkbase(ref_xml_bytes)

for ref in refs:
    print(f"{ref.concept_name}: {ref.role}")
    for part in ref.parts:
        print(f"  {part.local_name} = {part.value}")

Footnotes

from xbrl_core import parse_footnote_links

footnote_map = parse_footnote_links(parsed.footnote_links)

notes = footnote_map.get("IdFact1234")
for n in notes:
    print(n.text, n.lang)

print(footnote_map.fact_ids)       # Fact IDs with footnotes
print(len(footnote_map))           # number of Facts with footnotes

Schema Parsing

from xbrl_core import parse_xsd_elements

elements = parse_xsd_elements(xsd_bytes)

elem = elements["NetSales"]
print(elem.period_type)         # "duration"
print(elem.balance)             # "credit"
print(elem.abstract)            # False
print(elem.type_name)           # "xbrli:monetaryItemType"
print(elem.substitution_group)  # "xbrli:item"

Calculation Validation

Validates summation-item relationships per XBRL 2.1 section 5.2.5.2, with decimals-based rounding tolerance.

from xbrl_core import validate_calculations, parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)
result = validate_calculations(items, calc_lb)

print(result)           # "Calculation validation: PASS (checked=42, passed=42, errors=0, skipped=3)"
print(result.is_valid)  # True

for issue in result.issues:
    print(issue.parent_concept, issue.expected, issue.actual, issue.severity)

Text Block Extraction

Extracts textBlockItemType facts (e.g. MD&A, risk factors, notes) from filings.

from xbrl_core import extract_text_blocks, clean_html

blocks = extract_text_blocks(parsed.facts, ctx_map)

for block in blocks:
    print(block.concept)         # "BusinessRisksTextBlock"
    print(block.period)          # DurationPeriod(...)
    plain = clean_html(block.html)
    print(plain[:200])

clean_html() converts HTML fragments to plain text, preserving table structure with tabs and newlines — useful as preprocessing for LLM / RAG pipelines.

DataFrame Conversion

Requires pip install 'xbrl-core[analysis]'.

from xbrl_core import line_items_to_dataframe, to_csv, to_parquet

df = line_items_to_dataframe(items, label_lang="en")
print(df[["local_name", "label", "value", "period_end"]].head())

# Export
to_csv(df, "output.csv")
to_parquet(df, "output.parquet")

Requires pip install 'xbrl-core[excel]':

from xbrl_core import to_excel

to_excel(df, "output.xlsx", sheet_name="BalanceSheet")

Display

Rich (Terminal)

Requires pip install 'xbrl-core[display]'.

from rich.console import Console
from xbrl_core import render_statement

table = render_statement(items, title="Balance Sheet", label_lang="en")
Console().print(table)

Hierarchical Display

Use DisplayHint with presentation tree data for indented financial statements:

from xbrl_core import (
    build_display_rows,
    render_hierarchical_statement,
    DisplayHint,
)

hints = [
    DisplayHint(concept="AssetsAbstract", depth=0, is_abstract=True, label="Assets"),
    DisplayHint(concept="CashAndDeposits", depth=1),
    DisplayHint(concept="TotalAssets", depth=0, is_total=True),
]

# Rich Table
table = render_hierarchical_statement(items, hints=hints, title="BS")

# Or get raw DisplayRow objects
rows = build_display_rows(items, hints=hints)

HTML (Jupyter)

from xbrl_core import to_html

html = to_html(items, hints=hints, title="Balance Sheet")

Label Resolution

LabelResolver is a Protocol — implement it to inject taxonomy labels into build_line_items().

from xbrl_core import LabelResolver, LabelInfo, LabelSource

class MyResolver:
    def resolve(self, concept_qname, lang, role):
        # Look up label from your taxonomy data
        return LabelInfo(text="Net sales", role=role, lang=lang, source=LabelSource.STANDARD)

    def resolve_batch(self, concept_qnames, lang, role):
        return {qn: self.resolve(qn, lang, role) for qn in concept_qnames}

items = build_line_items(parsed.facts, ctx_map, resolver=MyResolver(), langs=("en",))

Error Handling

All errors inherit from XbrlError and carry a structured error code and context.

from xbrl_core import XbrlError, XbrlParseError, XbrlValidationError

try:
    parsed = parse_xbrl_facts(bad_bytes)
except XbrlParseError as e:
    print(e.code)     # "XBRL_PARSE_001"
    print(e.context)  # {"source_path": "..."}
Error code prefix Exception class Description
XBRL_PARSE_xxx XbrlParseError XML/XBRL parse errors
XBRL_CTX_xxx XbrlParseError Context structuring errors
XBRL_UNIT_xxx XbrlParseError Unit structuring errors
XBRL_LINK_xxx XbrlParseError Linkbase parse errors
XBRL_IXBRL_xxx XbrlParseError iXBRL parse errors
XBRL_VAL_xxx XbrlValidationError Validation errors

XbrlWarning (a UserWarning subclass) is emitted for non-fatal issues.

Requirements

Python 3.12+. The only required dependency is lxml >= 5.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xbrl_core-0.1.1.tar.gz (68.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xbrl_core-0.1.1-py3-none-any.whl (89.0 kB view details)

Uploaded Python 3

File details

Details for the file xbrl_core-0.1.1.tar.gz.

File metadata

  • Download URL: xbrl_core-0.1.1.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c2830da8b8e15b3b855f21eccbd2b18da060ba135f63c7ddefc4f7dd8077cbc6
MD5 727f191f63741d6285dbb49fbde29464
BLAKE2b-256 2dc5045b05bf886bce74c017e5137a24023e865a7435fdc5d0daf537cc984b6c

See more details on using hashes here.

File details

Details for the file xbrl_core-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: xbrl_core-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 89.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae88fd3ebd6b2f9f2830ac34d78af0bc4d6b19285bf8bf1d078d6a7f2451b8a9
MD5 1451e1b8c16a20f0a0c648ef894cdf4a
BLAKE2b-256 0d0a567158099c17662ca197a737b5bebd89f52e2c36d6968a0739e78f6eeabd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page