Skip to main content

Lightweight XBRL 2.1 / iXBRL 1.1 parser and structured data extraction library

Project description

xbrl-core — Lightweight XBRL 2.1 / iXBRL 1.1 Parser for Python

PyPI version Python Context7 Indexed Context7 llms.txt

xbrl-core is a pure-Python parser and structured data extraction library for XBRL 2.1 instance documents and iXBRL (Inline XBRL) documents. It supports fact extraction, context/unit structuring, all five linkbase types (presentation, calculation, definition, label, reference), XSD schema parsing, calculation validation, text block extraction, pandas/DataFrame conversion, and Rich/HTML rendering. The only required dependency is lxml.

GitHub Repository

Installation

pip install xbrl-core

Optional dependencies:

# pandas + pyarrow (DataFrame conversion, Parquet export)
pip install 'xbrl-core[analysis]'

# Rich terminal display
pip install 'xbrl-core[display]'

# Excel export (pandas + openpyxl)
pip install 'xbrl-core[excel]'

# Everything
pip install 'xbrl-core[all]'

Quick Start

from xbrl_core import parse_xbrl_facts, structure_contexts, build_line_items

# 1. Parse an XBRL instance document
with open("instance.xbrl", "rb") as f:
    parsed = parse_xbrl_facts(f.read(), source_path="instance.xbrl")

print(f"Facts: {parsed.fact_count}")

# 2. Structure contexts and build typed LineItems
ctx_map = structure_contexts(parsed.contexts)
items = build_line_items(parsed.facts, ctx_map)

for item in items[:5]:
    print(item.local_name, item.value, item.period)

Parsing

XBRL Instance

parse_xbrl_facts() takes raw bytes and returns a ParsedXBRL containing facts, contexts, units, schema refs, footnote links, and ignored elements.

from xbrl_core import parse_xbrl_facts

parsed = parse_xbrl_facts(xbrl_bytes, source_path="example.xbrl")

# Extracted data
parsed.facts             # tuple[RawFact, ...]
parsed.contexts          # tuple[RawContext, ...]
parsed.units             # tuple[RawUnit, ...]
parsed.schema_refs       # tuple[RawSchemaRef, ...]
parsed.footnote_links    # tuple[RawFootnoteLink, ...]
parsed.ignored_elements  # tuple[IgnoredElement, ...]
parsed.fact_count        # int

iXBRL (Inline XBRL)

parse_ixbrl_facts() parses iXBRL (XHTML-embedded XBRL) documents. The output is the same ParsedXBRL type, so downstream pipelines work identically.

from xbrl_core import parse_ixbrl_facts

parsed = parse_ixbrl_facts(ixbrl_bytes, source_path="report.htm")

for fact in parsed.facts[:5]:
    print(fact.local_name, fact.value_raw)

iXBRL format attributes (ixt:numdotdecimal, ixt:numcommadecimal, etc.) and scale/sign attributes are automatically applied. Custom formats can be registered:

from xbrl_core import FormatRegistry, parse_ixbrl_facts

registry = FormatRegistry()
registry.register("dateyearmonthdaycjk", my_cjk_date_func)

parsed = parse_ixbrl_facts(ixbrl_bytes, format_registry=registry)

IXDS (Inline XBRL Document Set)

Multiple iXBRL files from a single filing can be merged:

from xbrl_core import parse_ixbrl_facts, merge_ixbrl_results

results = [parse_ixbrl_facts(f) for f in ixbrl_files]
merged = merge_ixbrl_results(results)

Strict / Lenient Mode

Both parsers accept a strict parameter. When strict=True (default), spec violations raise XbrlParseError. When strict=False, violations emit warnings and are recorded in ignored_elements.

parsed = parse_xbrl_facts(xbrl_bytes, strict=False)
for elem in parsed.ignored_elements:
    print(elem.reason, elem.source_line)

Context Structuring

structure_contexts() converts raw context XML fragments into typed StructuredContext objects with period, entity, and dimension information.

from xbrl_core import structure_contexts, ContextCollection

ctx_map = structure_contexts(parsed.contexts)

# Direct dict access
ctx = ctx_map["CurrentYearInstant"]
print(ctx.period)       # InstantPeriod(instant=datetime.date(2024, 3, 31))
print(ctx.entity_id)    # "E00001"
print(ctx.dimensions)   # tuple[DimensionMember, ...]

# ContextCollection for filtering
coll = ContextCollection(ctx_map)
coll.filter_instant()                   # instant contexts only
coll.filter_duration()                  # duration contexts only
coll.filter_no_dimensions()             # no dimension members
coll.filter_by_dimension(axis="{ns}ProductAxis", member="{ns}SegmentA")

coll.latest_instant_period              # most recent InstantPeriod
coll.unique_duration_periods            # unique DurationPeriods, sorted

Unit Structuring

structure_units() converts raw unit XML fragments into typed StructuredUnit objects.

from xbrl_core import structure_units

unit_map = structure_units(parsed.units)

unit = unit_map["JPY"]
print(unit.is_monetary)     # True
print(unit.currency_code)   # "JPY"

unit = unit_map["pure"]
print(unit.is_pure)         # True

unit = unit_map["JPYPerShare"]
print(unit.is_per_share)    # True

Building LineItems

build_line_items() merges RawFact + StructuredContext + optional LabelResolver into fully typed LineItem objects.

from xbrl_core import build_line_items

items = build_line_items(parsed.facts, ctx_map, langs=("en", "ja"))

for item in items:
    print(item.local_name)    # "NetSales"
    print(item.value)         # Decimal('1234567890')
    print(item.period)        # InstantPeriod / DurationPeriod
    print(item.entity_id)     # "E00001"
    print(item.dimensions)    # tuple[DimensionMember, ...]
    print(item.label("en"))   # "Net sales"
    print(item.label("ja"))   # "売上高"

Linkbase Parsing

Presentation Linkbase

from xbrl_core import parse_presentation_linkbase, merge_presentation_trees

trees = parse_presentation_linkbase(pre_xml_bytes)

for role_uri, tree in trees.items():
    # Flatten the tree (depth-first)
    for node in tree.flatten(skip_abstract=True, skip_dimension=True):
        print("  " * node.depth + node.concept)

    # Get only the line-items subtree
    for node in tree.line_items_roots():
        print(node.concept, node.order)

# Merge multiple presentation linkbases
merged = merge_presentation_trees(trees_a, trees_b)

Calculation Linkbase

from xbrl_core import parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)

for role_uri in calc_lb.role_uris:
    tree = calc_lb.get_tree(role_uri)
    for arc in tree.arcs:
        sign = "+" if arc.weight == 1 else "-"
        print(f"  {arc.parent} {sign}-> {arc.child}")

# Query relationships
calc_lb.children_of("GrossProfit")        # child arcs
calc_lb.parent_of("NetSales")             # parent arcs
calc_lb.ancestors_of("NetSales", role_uri=role)  # root-ward chain

Definition Linkbase

from xbrl_core import parse_definition_linkbase

def_trees = parse_definition_linkbase(def_xml_bytes)

for role_uri, tree in def_trees.items():
    for hc in tree.hypercubes:
        print(f"Table: {hc.table_concept}")
        for axis in hc.axes:
            print(f"  Axis: {axis.axis_concept}")
            if axis.domain:
                print(f"  Domain: {axis.domain.concept}")

Label Linkbase

from xbrl_core import parse_label_linkbase

labels = parse_label_linkbase(lab_xml_bytes)

for lab in labels:
    print(f"{lab.concept_name} [{lab.lang}] = {lab.text}")

Reference Linkbase

from xbrl_core import parse_reference_linkbase

refs = parse_reference_linkbase(ref_xml_bytes)

for ref in refs:
    print(f"{ref.concept_name}: {ref.role}")
    for part in ref.parts:
        print(f"  {part.local_name} = {part.value}")

Footnotes

from xbrl_core import parse_footnote_links

footnote_map = parse_footnote_links(parsed.footnote_links)

notes = footnote_map.get("IdFact1234")
for n in notes:
    print(n.text, n.lang)

print(footnote_map.fact_ids)       # Fact IDs with footnotes
print(len(footnote_map))           # number of Facts with footnotes

Schema Parsing

from xbrl_core import parse_xsd_elements

elements = parse_xsd_elements(xsd_bytes)

elem = elements["NetSales"]
print(elem.period_type)         # "duration"
print(elem.balance)             # "credit"
print(elem.abstract)            # False
print(elem.type_name)           # "xbrli:monetaryItemType"
print(elem.substitution_group)  # "xbrli:item"

Calculation Validation

Validates summation-item relationships per XBRL 2.1 section 5.2.5.2, with decimals-based rounding tolerance.

from xbrl_core import validate_calculations, parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)
result = validate_calculations(items, calc_lb)

print(result)           # "Calculation validation: PASS (checked=42, passed=42, errors=0, skipped=3)"
print(result.is_valid)  # True

for issue in result.issues:
    print(issue.parent_concept, issue.expected, issue.actual, issue.severity)

Text Block Extraction

Extracts textBlockItemType facts (e.g. MD&A, risk factors, notes) from filings.

from xbrl_core import extract_text_blocks, clean_html

blocks = extract_text_blocks(parsed.facts, ctx_map)

for block in blocks:
    print(block.concept)         # "BusinessRisksTextBlock"
    print(block.period)          # DurationPeriod(...)
    plain = clean_html(block.html)
    print(plain[:200])

clean_html() converts HTML fragments to plain text, preserving table structure with tabs and newlines — useful as preprocessing for LLM / RAG pipelines.

DataFrame Conversion

Requires pip install 'xbrl-core[analysis]'.

from xbrl_core import line_items_to_dataframe, to_csv, to_parquet

df = line_items_to_dataframe(items, label_lang="en")
print(df[["local_name", "label", "value", "period_end"]].head())

# Export
to_csv(df, "output.csv")
to_parquet(df, "output.parquet")

Requires pip install 'xbrl-core[excel]':

from xbrl_core import to_excel

to_excel(df, "output.xlsx", sheet_name="BalanceSheet")

Display

Rich (Terminal)

Requires pip install 'xbrl-core[display]'.

from rich.console import Console
from xbrl_core import render_statement

table = render_statement(items, title="Balance Sheet", label_lang="en")
Console().print(table)

Hierarchical Display

Use DisplayHint with presentation tree data for indented financial statements:

from xbrl_core import (
    build_display_rows,
    render_hierarchical_statement,
    DisplayHint,
)

hints = [
    DisplayHint(concept="AssetsAbstract", depth=0, is_abstract=True, label="Assets"),
    DisplayHint(concept="CashAndDeposits", depth=1),
    DisplayHint(concept="TotalAssets", depth=0, is_total=True),
]

# Rich Table
table = render_hierarchical_statement(items, hints=hints, title="BS")

# Or get raw DisplayRow objects
rows = build_display_rows(items, hints=hints)

HTML (Jupyter)

from xbrl_core import to_html

html = to_html(items, hints=hints, title="Balance Sheet")

Label Resolution

LabelResolver is a Protocol — implement it to inject taxonomy labels into build_line_items().

from xbrl_core import LabelResolver, LabelInfo, LabelSource

class MyResolver:
    def resolve(self, concept_qname, lang, role):
        # Look up label from your taxonomy data
        return LabelInfo(text="Net sales", role=role, lang=lang, source=LabelSource.STANDARD)

    def resolve_batch(self, concept_qnames, lang, role):
        return {qn: self.resolve(qn, lang, role) for qn in concept_qnames}

items = build_line_items(parsed.facts, ctx_map, resolver=MyResolver(), langs=("en",))

Error Handling

All errors inherit from XbrlError and carry a structured error code and context.

from xbrl_core import XbrlError, XbrlParseError, XbrlValidationError

try:
    parsed = parse_xbrl_facts(bad_bytes)
except XbrlParseError as e:
    print(e.code)     # "XBRL_PARSE_001"
    print(e.context)  # {"source_path": "..."}
Error code prefix Exception class Description
XBRL_PARSE_xxx XbrlParseError XML/XBRL parse errors
XBRL_CTX_xxx XbrlParseError Context structuring errors
XBRL_UNIT_xxx XbrlParseError Unit structuring errors
XBRL_LINK_xxx XbrlParseError Linkbase parse errors
XBRL_IXBRL_xxx XbrlParseError iXBRL parse errors
XBRL_VAL_xxx XbrlValidationError Validation errors

XbrlWarning (a UserWarning subclass) is emitted for non-fatal issues.

Requirements

Python 3.12+. The only required dependency is lxml >= 5.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xbrl_core-0.1.0.tar.gz (67.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xbrl_core-0.1.0-py3-none-any.whl (88.3 kB view details)

Uploaded Python 3

File details

Details for the file xbrl_core-0.1.0.tar.gz.

File metadata

  • Download URL: xbrl_core-0.1.0.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3ca210162238cf9b30ef593a6124727f4d640e79380b5968c9667570febdcc4f
MD5 6ed96396623399adfb7ff9ddfaef1600
BLAKE2b-256 3a22c13eaf4c467c7d1e0fe2ed8887d6ba24781c2eccd7d7b684b51e949bfb29

See more details on using hashes here.

File details

Details for the file xbrl_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xbrl_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 88.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02f0eb0ff89b9f8a378d256953aaffc9dfd421ab3d273ffba8e898dce828720d
MD5 0f23b18f2dd1c10746327b9d78307c85
BLAKE2b-256 91eb0f51a11ca4938e8dc220c7eacf506b1742c9b0bd8195289a57a636908e8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page