Skip to main content

Lightweight XBRL 2.1 / iXBRL 1.1 parser and structured data extraction library

Project description

xbrl-core — Lightweight XBRL 2.1 / iXBRL 1.1 Parser for Python

PyPI version Python Context7 Indexed Context7 llms.txt

xbrl-core is a pure-Python parser and structured data extraction library for XBRL 2.1 instance documents and iXBRL (Inline XBRL) documents. It supports fact extraction, context/unit structuring, all five linkbase types (presentation, calculation, definition, label, reference), XSD schema parsing, calculation validation, text block extraction, pandas/DataFrame conversion, and Rich/HTML rendering. The only required dependency is lxml.

GitHub Repository

Installation

pip install xbrl-core

Optional dependencies:

# pandas + pyarrow (DataFrame conversion, Parquet export)
pip install 'xbrl-core[analysis]'

# Rich terminal display
pip install 'xbrl-core[display]'

# Excel export (pandas + openpyxl)
pip install 'xbrl-core[excel]'

# Everything
pip install 'xbrl-core[all]'

Quick Start

from xbrl_core import parse_xbrl_facts, structure_contexts, build_line_items

# 1. Parse an XBRL instance document
with open("instance.xbrl", "rb") as f:
    parsed = parse_xbrl_facts(f.read(), source_path="instance.xbrl")

print(f"Facts: {parsed.fact_count}")

# 2. Structure contexts and build typed LineItems
ctx_map = structure_contexts(parsed.contexts)
items = build_line_items(parsed.facts, ctx_map)

for item in items[:5]:
    print(item.local_name, item.value, item.period)

Parsing

XBRL Instance

parse_xbrl_facts() takes raw bytes and returns a ParsedXBRL containing facts, contexts, units, schema refs, footnote links, and ignored elements.

from xbrl_core import parse_xbrl_facts

parsed = parse_xbrl_facts(xbrl_bytes, source_path="example.xbrl")

# Extracted data
parsed.facts             # tuple[RawFact, ...]
parsed.contexts          # tuple[RawContext, ...]
parsed.units             # tuple[RawUnit, ...]
parsed.schema_refs       # tuple[RawSchemaRef, ...]
parsed.footnote_links    # tuple[RawFootnoteLink, ...]
parsed.ignored_elements  # tuple[IgnoredElement, ...]
parsed.fact_count        # int

iXBRL (Inline XBRL)

parse_ixbrl_facts() parses iXBRL (XHTML-embedded XBRL) documents. The output is the same ParsedXBRL type, so downstream pipelines work identically.

from xbrl_core import parse_ixbrl_facts

parsed = parse_ixbrl_facts(ixbrl_bytes, source_path="report.htm")

for fact in parsed.facts[:5]:
    print(fact.local_name, fact.value_raw)

iXBRL format attributes (ixt:numdotdecimal, ixt:numcommadecimal, etc.) and scale/sign attributes are automatically applied. Custom formats can be registered:

from xbrl_core import FormatRegistry, parse_ixbrl_facts

registry = FormatRegistry()
registry.register("dateyearmonthdaycjk", my_cjk_date_func)

parsed = parse_ixbrl_facts(ixbrl_bytes, format_registry=registry)

IXDS (Inline XBRL Document Set)

Multiple iXBRL files from a single filing can be merged:

from xbrl_core import parse_ixbrl_facts, merge_ixbrl_results

results = [parse_ixbrl_facts(f) for f in ixbrl_files]
merged = merge_ixbrl_results(results)

Strict / Lenient Mode

Both parsers accept a strict parameter. When strict=True (default), spec violations raise XbrlParseError. When strict=False, violations emit warnings and are recorded in ignored_elements.

parsed = parse_xbrl_facts(xbrl_bytes, strict=False)
for elem in parsed.ignored_elements:
    print(elem.reason, elem.source_line)

Context Structuring

structure_contexts() converts raw context XML fragments into typed StructuredContext objects with period, entity, and dimension information.

from xbrl_core import structure_contexts, ContextCollection

ctx_map = structure_contexts(parsed.contexts)

# Direct dict access
ctx = ctx_map["CurrentYearInstant"]
print(ctx.period)       # InstantPeriod(instant=datetime.date(2024, 3, 31))
print(ctx.entity_id)    # "E00001"
print(ctx.dimensions)   # tuple[DimensionMember, ...]

# ContextCollection for filtering
coll = ContextCollection(ctx_map)
coll.filter_instant()                   # instant contexts only
coll.filter_duration()                  # duration contexts only
coll.filter_no_dimensions()             # no dimension members
coll.filter_by_dimension(axis="{ns}ProductAxis", member="{ns}SegmentA")

coll.latest_instant_period              # most recent InstantPeriod
coll.unique_duration_periods            # unique DurationPeriods, sorted

Unit Structuring

structure_units() converts raw unit XML fragments into typed StructuredUnit objects.

from xbrl_core import structure_units

unit_map = structure_units(parsed.units)

unit = unit_map["JPY"]
print(unit.is_monetary)     # True
print(unit.currency_code)   # "JPY"

unit = unit_map["pure"]
print(unit.is_pure)         # True

unit = unit_map["JPYPerShare"]
print(unit.is_per_share)    # True

Building LineItems

build_line_items() merges RawFact + StructuredContext + optional LabelResolver into fully typed LineItem objects.

from xbrl_core import build_line_items

items = build_line_items(parsed.facts, ctx_map, langs=("en", "ja"))

for item in items:
    print(item.local_name)    # "NetSales"
    print(item.value)         # Decimal('1234567890')
    print(item.period)        # InstantPeriod / DurationPeriod
    print(item.entity_id)     # "E00001"
    print(item.dimensions)    # tuple[DimensionMember, ...]
    print(item.label("en"))   # "Net sales"
    print(item.label("ja"))   # "売上高"

Linkbase Parsing

Presentation Linkbase

from xbrl_core import parse_presentation_linkbase, merge_presentation_trees

trees = parse_presentation_linkbase(pre_xml_bytes)

for role_uri, tree in trees.items():
    # Flatten the tree (depth-first)
    for node in tree.flatten(skip_abstract=True, skip_dimension=True):
        print("  " * node.depth + node.concept)

    # Get only the line-items subtree
    for node in tree.line_items_roots():
        print(node.concept, node.order)

# Merge multiple presentation linkbases
merged = merge_presentation_trees(trees_a, trees_b)

Calculation Linkbase

from xbrl_core import parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)

for role_uri in calc_lb.role_uris:
    tree = calc_lb.get_tree(role_uri)
    for arc in tree.arcs:
        sign = "+" if arc.weight == 1 else "-"
        print(f"  {arc.parent} {sign}-> {arc.child}")

# Query relationships
calc_lb.children_of("GrossProfit")        # child arcs
calc_lb.parent_of("NetSales")             # parent arcs
calc_lb.ancestors_of("NetSales", role_uri=role)  # root-ward chain

Definition Linkbase

from xbrl_core import parse_definition_linkbase

def_trees = parse_definition_linkbase(def_xml_bytes)

for role_uri, tree in def_trees.items():
    for hc in tree.hypercubes:
        print(f"Table: {hc.table_concept}")
        for axis in hc.axes:
            print(f"  Axis: {axis.axis_concept}")
            if axis.domain:
                print(f"  Domain: {axis.domain.concept}")

Label Linkbase

from xbrl_core import parse_label_linkbase

labels = parse_label_linkbase(lab_xml_bytes)

for lab in labels:
    print(f"{lab.concept_name} [{lab.lang}] = {lab.text}")

Reference Linkbase

from xbrl_core import parse_reference_linkbase

refs = parse_reference_linkbase(ref_xml_bytes)

for ref in refs:
    print(f"{ref.concept_name}: {ref.role}")
    for part in ref.parts:
        print(f"  {part.local_name} = {part.value}")

Footnotes

from xbrl_core import parse_footnote_links

footnote_map = parse_footnote_links(parsed.footnote_links)

notes = footnote_map.get("IdFact1234")
for n in notes:
    print(n.text, n.lang)

print(footnote_map.fact_ids)       # Fact IDs with footnotes
print(len(footnote_map))           # number of Facts with footnotes

Schema Parsing

from xbrl_core import parse_xsd_elements

elements = parse_xsd_elements(xsd_bytes)

elem = elements["NetSales"]
print(elem.period_type)         # "duration"
print(elem.balance)             # "credit"
print(elem.abstract)            # False
print(elem.type_name)           # "xbrli:monetaryItemType"
print(elem.substitution_group)  # "xbrli:item"

Calculation Validation

Validates summation-item relationships per XBRL 2.1 section 5.2.5.2, with decimals-based rounding tolerance.

from xbrl_core import validate_calculations, parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)
result = validate_calculations(items, calc_lb)

print(result)           # "Calculation validation: PASS (checked=42, passed=42, errors=0, skipped=3)"
print(result.is_valid)  # True

for issue in result.issues:
    print(issue.parent_concept, issue.expected, issue.actual, issue.severity)

Text Block Extraction

Extracts textBlockItemType facts (e.g. MD&A, risk factors, notes) from filings.

from xbrl_core import extract_text_blocks, clean_html

blocks = extract_text_blocks(parsed.facts, ctx_map)

for block in blocks:
    print(block.concept)         # "BusinessRisksTextBlock"
    print(block.period)          # DurationPeriod(...)
    plain = clean_html(block.html)
    print(plain[:200])

clean_html() converts HTML fragments to plain text, preserving table structure with tabs and newlines — useful as preprocessing for LLM / RAG pipelines.

DataFrame Conversion

Requires pip install 'xbrl-core[analysis]'.

from xbrl_core import line_items_to_dataframe, to_csv, to_parquet

df = line_items_to_dataframe(items, label_lang="en")
print(df[["local_name", "label", "value", "period_end"]].head())

# Export
to_csv(df, "output.csv")
to_parquet(df, "output.parquet")

Requires pip install 'xbrl-core[excel]':

from xbrl_core import to_excel

to_excel(df, "output.xlsx", sheet_name="BalanceSheet")

Display

Rich (Terminal)

Requires pip install 'xbrl-core[display]'.

from rich.console import Console
from xbrl_core import render_statement

table = render_statement(items, title="Balance Sheet", label_lang="en")
Console().print(table)

Hierarchical Display

Use DisplayHint with presentation tree data for indented financial statements:

from xbrl_core import (
    build_display_rows,
    render_hierarchical_statement,
    DisplayHint,
)

hints = [
    DisplayHint(concept="AssetsAbstract", depth=0, is_abstract=True, label="Assets"),
    DisplayHint(concept="CashAndDeposits", depth=1),
    DisplayHint(concept="TotalAssets", depth=0, is_total=True),
]

# Rich Table
table = render_hierarchical_statement(items, hints=hints, title="BS")

# Or get raw DisplayRow objects
rows = build_display_rows(items, hints=hints)

HTML (Jupyter)

from xbrl_core import to_html

html = to_html(items, hints=hints, title="Balance Sheet")

Label Resolution

LabelResolver is a Protocol — implement it to inject taxonomy labels into build_line_items().

from xbrl_core import LabelResolver, LabelInfo, LabelSource

class MyResolver:
    def resolve(self, concept_qname, lang, role):
        # Look up label from your taxonomy data
        return LabelInfo(text="Net sales", role=role, lang=lang, source=LabelSource.STANDARD)

    def resolve_batch(self, concept_qnames, lang, role):
        return {qn: self.resolve(qn, lang, role) for qn in concept_qnames}

items = build_line_items(parsed.facts, ctx_map, resolver=MyResolver(), langs=("en",))

Error Handling

All errors inherit from XbrlError and carry a structured error code and context.

from xbrl_core import XbrlError, XbrlParseError, XbrlValidationError

try:
    parsed = parse_xbrl_facts(bad_bytes)
except XbrlParseError as e:
    print(e.code)     # "XBRL_PARSE_001"
    print(e.context)  # {"source_path": "..."}
Error code prefix Exception class Description
XBRL_PARSE_xxx XbrlParseError XML/XBRL parse errors
XBRL_CTX_xxx XbrlParseError Context structuring errors
XBRL_UNIT_xxx XbrlParseError Unit structuring errors
XBRL_LINK_xxx XbrlParseError Linkbase parse errors
XBRL_IXBRL_xxx XbrlParseError iXBRL parse errors
XBRL_VAL_xxx XbrlValidationError Validation errors

XbrlWarning (a UserWarning subclass) is emitted for non-fatal issues.

Customizing Error / Warning Classes

All linkbase parsers and structure_units() accept optional error_class and warning_class parameters. This allows downstream libraries to substitute their own exception and warning types — useful when wrapping xbrl-core in a domain-specific package (e.g. an EDINET library).

from xbrl_core import XbrlParseError, XbrlWarning, parse_calculation_linkbase

class EdinetParseError(XbrlParseError):
    """EDINET-specific parse error."""

class EdinetWarning(UserWarning):
    """EDINET-specific warning."""

lb = parse_calculation_linkbase(
    xml_bytes,
    error_class=EdinetParseError,
    warning_class=EdinetWarning,
)

Supported by: parse_calculation_linkbase, parse_definition_linkbase, parse_presentation_linkbase, parse_label_linkbase, parse_reference_linkbase (error_class only), parse_footnote_links, structure_units.

Customizing Concept Extraction

Linkbase parsers extract concept local names from xlink:href fragments. The default logic handles standard XBRL taxonomy patterns ({prefix}_{YYYY-MM-DD}.xsd#prefix_ConceptName), but jurisdiction-specific taxonomies may use different naming conventions.

All linkbase parsers accept a concept_extractor parameter (Callable[[str], str | None]) to override this logic:

import re
from xbrl_core import ConceptExtractor, parse_label_linkbase

def edinet_concept_extractor(href: str) -> str | None:
    """EDINET Strategy 2: extract local name by backward _[A-Z] scan."""
    if "#" not in href:
        return None
    fragment = href.rsplit("#", 1)[1]
    m = re.search(r"_([A-Z][A-Za-z0-9]*)$", fragment)
    return m.group(1) if m else fragment

labels = parse_label_linkbase(xml_bytes, concept_extractor=edinet_concept_extractor)

Supported by: parse_calculation_linkbase, parse_definition_linkbase, parse_presentation_linkbase, parse_label_linkbase, parse_reference_linkbase.

Requirements

Python 3.12+. The only required dependency is lxml >= 5.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xbrl_core-0.1.2.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xbrl_core-0.1.2-py3-none-any.whl (90.6 kB view details)

Uploaded Python 3

File details

Details for the file xbrl_core-0.1.2.tar.gz.

File metadata

  • Download URL: xbrl_core-0.1.2.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2a35028d6a63dc86aff94054fc5d815f1c88818cb0a54af36979e4a3914f7b08
MD5 038e8da48863df73e63ee6bd317f37de
BLAKE2b-256 08215ad1b25bf7c2d65c3849f4eeb7a76d434c6b301b254a3a8dd6f539a06853

See more details on using hashes here.

File details

Details for the file xbrl_core-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: xbrl_core-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 90.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8e588812517d3a14ae2b702ff23b235cfef3ff06178e78433821abc943008b88
MD5 8fbf11c0b0bec57fa0e4e3e2657b7572
BLAKE2b-256 cb9230fc7a8b0cb555ca8af6d1952a1161faf33a6092daa1dba8d15594041095

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page