Lightweight XBRL 2.1 / iXBRL 1.1 parser and structured data extraction library

These details have not been verified by PyPI

Project links

Project description

xbrl-core — Lightweight XBRL 2.1 / iXBRL 1.1 Parser for Python

xbrl-core is a pure-Python parser and structured data extraction library for XBRL 2.1 instance documents and iXBRL (Inline XBRL) documents. It supports fact extraction, context/unit structuring, all five linkbase types (presentation, calculation, definition, label, reference), XSD schema parsing, calculation validation, text block extraction, pandas/DataFrame conversion, and Rich/HTML rendering. The only required dependency is lxml.

GitHub Repository

Installation

pip install xbrl-core

Optional dependencies:

# pandas + pyarrow (DataFrame conversion, Parquet export)
pip install 'xbrl-core[analysis]'

# Rich terminal display
pip install 'xbrl-core[display]'

# Excel export (pandas + openpyxl)
pip install 'xbrl-core[excel]'

# Everything
pip install 'xbrl-core[all]'

Quick Start

from xbrl_core import parse_xbrl_facts, structure_contexts, build_line_items

# 1. Parse an XBRL instance document
with open("instance.xbrl", "rb") as f:
    parsed = parse_xbrl_facts(f.read(), source_path="instance.xbrl")

print(f"Facts: {parsed.fact_count}")

# 2. Structure contexts and build typed LineItems
ctx_map = structure_contexts(parsed.contexts)
items = build_line_items(parsed.facts, ctx_map)

for item in items[:5]:
    print(item.local_name, item.value, item.period)

Parsing

XBRL Instance

parse_xbrl_facts() takes raw bytes and returns a ParsedXBRL containing facts, contexts, units, schema refs, footnote links, and ignored elements.

from xbrl_core import parse_xbrl_facts

parsed = parse_xbrl_facts(xbrl_bytes, source_path="example.xbrl")

# Extracted data
parsed.facts             # tuple[RawFact, ...]
parsed.contexts          # tuple[RawContext, ...]
parsed.units             # tuple[RawUnit, ...]
parsed.schema_refs       # tuple[RawSchemaRef, ...]
parsed.footnote_links    # tuple[RawFootnoteLink, ...]
parsed.ignored_elements  # tuple[IgnoredElement, ...]
parsed.fact_count        # int

iXBRL (Inline XBRL)

parse_ixbrl_facts() parses iXBRL (XHTML-embedded XBRL) documents. The output is the same ParsedXBRL type, so downstream pipelines work identically.

from xbrl_core import parse_ixbrl_facts

parsed = parse_ixbrl_facts(ixbrl_bytes, source_path="report.htm")

for fact in parsed.facts[:5]:
    print(fact.local_name, fact.value_raw)

iXBRL format attributes (ixt:numdotdecimal, ixt:numcommadecimal, etc.) and scale/sign attributes are automatically applied. Custom formats can be registered:

from xbrl_core import FormatRegistry, parse_ixbrl_facts

registry = FormatRegistry()
registry.register("dateyearmonthdaycjk", my_cjk_date_func)

parsed = parse_ixbrl_facts(ixbrl_bytes, format_registry=registry)

IXDS (Inline XBRL Document Set)

Multiple iXBRL files from a single filing can be merged:

from xbrl_core import parse_ixbrl_facts, merge_ixbrl_results

results = [parse_ixbrl_facts(f) for f in ixbrl_files]
merged = merge_ixbrl_results(results)

Strict / Lenient Mode

Both parsers accept a strict parameter. When strict=True (default), spec violations raise XbrlParseError. When strict=False, violations emit warnings and are recorded in ignored_elements.

parsed = parse_xbrl_facts(xbrl_bytes, strict=False)
for elem in parsed.ignored_elements:
    print(elem.reason, elem.source_line)

Context Structuring

structure_contexts() converts raw context XML fragments into typed StructuredContext objects with period, entity, and dimension information.

from xbrl_core import structure_contexts, ContextCollection

ctx_map = structure_contexts(parsed.contexts)

# Direct dict access
ctx = ctx_map["CurrentYearInstant"]
print(ctx.period)       # InstantPeriod(instant=datetime.date(2024, 3, 31))
print(ctx.entity_id)    # "E00001"
print(ctx.dimensions)   # tuple[DimensionMember, ...]

# ContextCollection for filtering
coll = ContextCollection(ctx_map)
coll.filter_instant()                   # instant contexts only
coll.filter_duration()                  # duration contexts only
coll.filter_no_dimensions()             # no dimension members
coll.filter_by_dimension(axis="{ns}ProductAxis", member="{ns}SegmentA")

coll.latest_instant_period              # most recent InstantPeriod
coll.unique_duration_periods            # unique DurationPeriods, sorted

Unit Structuring

structure_units() converts raw unit XML fragments into typed StructuredUnit objects.

from xbrl_core import structure_units

unit_map = structure_units(parsed.units)

unit = unit_map["JPY"]
print(unit.is_monetary)     # True
print(unit.currency_code)   # "JPY"

unit = unit_map["pure"]
print(unit.is_pure)         # True

unit = unit_map["JPYPerShare"]
print(unit.is_per_share)    # True

Building LineItems

build_line_items() merges RawFact + StructuredContext + optional LabelResolver into fully typed LineItem objects.

from xbrl_core import build_line_items

items = build_line_items(parsed.facts, ctx_map, langs=("en", "ja"))

for item in items:
    print(item.local_name)    # "NetSales"
    print(item.value)         # Decimal('1234567890')
    print(item.period)        # InstantPeriod / DurationPeriod
    print(item.entity_id)     # "E00001"
    print(item.dimensions)    # tuple[DimensionMember, ...]
    print(item.label("en"))   # "Net sales"
    print(item.label("ja"))   # "売上高"

Linkbase Parsing

Presentation Linkbase

from xbrl_core import parse_presentation_linkbase, merge_presentation_trees

trees = parse_presentation_linkbase(pre_xml_bytes)

for role_uri, tree in trees.items():
    # Flatten the tree (depth-first)
    for node in tree.flatten(skip_abstract=True, skip_dimension=True):
        print("  " * node.depth + node.concept)

    # Get only the line-items subtree
    for node in tree.line_items_roots():
        print(node.concept, node.order)

# Merge multiple presentation linkbases
merged = merge_presentation_trees(trees_a, trees_b)

Calculation Linkbase

from xbrl_core import parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)

for role_uri in calc_lb.role_uris:
    tree = calc_lb.get_tree(role_uri)
    for arc in tree.arcs:
        sign = "+" if arc.weight == 1 else "-"
        print(f"  {arc.parent} {sign}-> {arc.child}")

# Query relationships
calc_lb.children_of("GrossProfit")        # child arcs
calc_lb.parent_of("NetSales")             # parent arcs
calc_lb.ancestors_of("NetSales", role_uri=role)  # root-ward chain

Definition Linkbase

from xbrl_core import parse_definition_linkbase

def_trees = parse_definition_linkbase(def_xml_bytes)

for role_uri, tree in def_trees.items():
    for hc in tree.hypercubes:
        print(f"Table: {hc.table_concept}")
        for axis in hc.axes:
            print(f"  Axis: {axis.axis_concept}")
            if axis.domain:
                print(f"  Domain: {axis.domain.concept}")

Label Linkbase

from xbrl_core import parse_label_linkbase

labels = parse_label_linkbase(lab_xml_bytes)

for lab in labels:
    print(f"{lab.concept_name} [{lab.lang}] = {lab.text}")

Reference Linkbase

from xbrl_core import parse_reference_linkbase

refs = parse_reference_linkbase(ref_xml_bytes)

for ref in refs:
    print(f"{ref.concept_name}: {ref.role}")
    for part in ref.parts:
        print(f"  {part.local_name} = {part.value}")

Footnotes

from xbrl_core import parse_footnote_links

footnote_map = parse_footnote_links(parsed.footnote_links)

notes = footnote_map.get("IdFact1234")
for n in notes:
    print(n.text, n.lang)

print(footnote_map.fact_ids)       # Fact IDs with footnotes
print(len(footnote_map))           # number of Facts with footnotes

Schema Parsing

from xbrl_core import parse_xsd_elements

elements = parse_xsd_elements(xsd_bytes)

elem = elements["NetSales"]
print(elem.period_type)         # "duration"
print(elem.balance)             # "credit"
print(elem.abstract)            # False
print(elem.type_name)           # "xbrli:monetaryItemType"
print(elem.substitution_group)  # "xbrli:item"

Calculation Validation

Validates summation-item relationships per XBRL 2.1 section 5.2.5.2, with decimals-based rounding tolerance.

from xbrl_core import validate_calculations, parse_calculation_linkbase

calc_lb = parse_calculation_linkbase(cal_xml_bytes)
result = validate_calculations(items, calc_lb)

print(result)           # "Calculation validation: PASS (checked=42, passed=42, errors=0, skipped=3)"
print(result.is_valid)  # True

for issue in result.issues:
    print(issue.parent_concept, issue.expected, issue.actual, issue.severity)

Text Block Extraction

Extracts textBlockItemType facts (e.g. MD&A, risk factors, notes) from filings.

from xbrl_core import extract_text_blocks, clean_html

blocks = extract_text_blocks(parsed.facts, ctx_map)

for block in blocks:
    print(block.concept)         # "BusinessRisksTextBlock"
    print(block.period)          # DurationPeriod(...)
    plain = clean_html(block.html)
    print(plain[:200])

clean_html() converts HTML fragments to plain text, preserving table structure with tabs and newlines — useful as preprocessing for LLM / RAG pipelines.

DataFrame Conversion

Requires pip install 'xbrl-core[analysis]'.

from xbrl_core import line_items_to_dataframe, to_csv, to_parquet

df = line_items_to_dataframe(items, label_lang="en")
print(df[["local_name", "label", "value", "period_end"]].head())

# Export
to_csv(df, "output.csv")
to_parquet(df, "output.parquet")

Requires pip install 'xbrl-core[excel]':

from xbrl_core import to_excel

to_excel(df, "output.xlsx", sheet_name="BalanceSheet")

Display

Rich (Terminal)

Requires pip install 'xbrl-core[display]'.

from rich.console import Console
from xbrl_core import render_statement

table = render_statement(items, title="Balance Sheet", label_lang="en")
Console().print(table)

Hierarchical Display

Use DisplayHint with presentation tree data for indented financial statements:

from xbrl_core import (
    build_display_rows,
    render_hierarchical_statement,
    DisplayHint,
)

hints = [
    DisplayHint(concept="AssetsAbstract", depth=0, is_abstract=True, label="Assets"),
    DisplayHint(concept="CashAndDeposits", depth=1),
    DisplayHint(concept="TotalAssets", depth=0, is_total=True),
]

# Rich Table
table = render_hierarchical_statement(items, hints=hints, title="BS")

# Or get raw DisplayRow objects
rows = build_display_rows(items, hints=hints)

HTML (Jupyter)

from xbrl_core import to_html

html = to_html(items, hints=hints, title="Balance Sheet")

Label Resolution

LabelResolver is a Protocol — implement it to inject taxonomy labels into build_line_items().

from xbrl_core import LabelResolver, LabelInfo, LabelSource

class MyResolver:
    def resolve(self, concept_qname, lang, role):
        # Look up label from your taxonomy data
        return LabelInfo(text="Net sales", role=role, lang=lang, source=LabelSource.STANDARD)

    def resolve_batch(self, concept_qnames, lang, role):
        return {qn: self.resolve(qn, lang, role) for qn in concept_qnames}

items = build_line_items(parsed.facts, ctx_map, resolver=MyResolver(), langs=("en",))

Error Handling

All errors inherit from XbrlError and carry a structured error code and context.

from xbrl_core import XbrlError, XbrlParseError, XbrlValidationError

try:
    parsed = parse_xbrl_facts(bad_bytes)
except XbrlParseError as e:
    print(e.code)     # "XBRL_PARSE_001"
    print(e.context)  # {"source_path": "..."}

Error code prefix	Exception class	Description
`XBRL_PARSE_xxx`	`XbrlParseError`	XML/XBRL parse errors
`XBRL_CTX_xxx`	`XbrlParseError`	Context structuring errors
`XBRL_UNIT_xxx`	`XbrlParseError`	Unit structuring errors
`XBRL_LINK_xxx`	`XbrlParseError`	Linkbase parse errors
`XBRL_IXBRL_xxx`	`XbrlParseError`	iXBRL parse errors
`XBRL_VAL_xxx`	`XbrlValidationError`	Validation errors

XbrlWarning (a UserWarning subclass) is emitted for non-fatal issues.

Customizing Error / Warning Classes

All linkbase parsers and structure_units() accept optional error_class and warning_class parameters. This allows downstream libraries to substitute their own exception and warning types — useful when wrapping xbrl-core in a domain-specific package (e.g. an EDINET library).

from xbrl_core import XbrlParseError, XbrlWarning, parse_calculation_linkbase

class EdinetParseError(XbrlParseError):
    """EDINET-specific parse error."""

class EdinetWarning(UserWarning):
    """EDINET-specific warning."""

lb = parse_calculation_linkbase(
    xml_bytes,
    error_class=EdinetParseError,
    warning_class=EdinetWarning,
)

Supported by: parse_calculation_linkbase, parse_definition_linkbase, parse_presentation_linkbase, parse_label_linkbase, parse_reference_linkbase (error_class only), parse_footnote_links, structure_units.

Customizing Concept Extraction

Linkbase parsers extract concept local names from xlink:href fragments. The default logic handles standard XBRL taxonomy patterns ({prefix}_{YYYY-MM-DD}.xsd#prefix_ConceptName), but jurisdiction-specific taxonomies may use different naming conventions.

All linkbase parsers accept a concept_extractor parameter (Callable[[str], str | None]) to override this logic:

import re
from xbrl_core import ConceptExtractor, parse_label_linkbase

def edinet_concept_extractor(href: str) -> str | None:
    """EDINET Strategy 2: extract local name by backward _[A-Z] scan."""
    if "#" not in href:
        return None
    fragment = href.rsplit("#", 1)[1]
    m = re.search(r"_([A-Z][A-Za-z0-9]*)$", fragment)
    return m.group(1) if m else fragment

labels = parse_label_linkbase(xml_bytes, concept_extractor=edinet_concept_extractor)

Supported by: parse_calculation_linkbase, parse_definition_linkbase, parse_presentation_linkbase, parse_label_linkbase, parse_reference_linkbase.

Requirements

Python 3.12+. The only required dependency is lxml >= 5.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Mar 20, 2026

0.2.0

Mar 12, 2026

This version

0.1.2

Mar 6, 2026

0.1.1

Mar 5, 2026

0.1.0

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xbrl_core-0.1.2.tar.gz (69.6 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xbrl_core-0.1.2-py3-none-any.whl (90.6 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file xbrl_core-0.1.2.tar.gz.

File metadata

Download URL: xbrl_core-0.1.2.tar.gz
Upload date: Mar 6, 2026
Size: 69.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`2a35028d6a63dc86aff94054fc5d815f1c88818cb0a54af36979e4a3914f7b08`
MD5	`038e8da48863df73e63ee6bd317f37de`
BLAKE2b-256	`08215ad1b25bf7c2d65c3849f4eeb7a76d434c6b301b254a3a8dd6f539a06853`

See more details on using hashes here.

File details

Details for the file xbrl_core-0.1.2-py3-none-any.whl.

File metadata

Download URL: xbrl_core-0.1.2-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 90.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.12

File hashes

Hashes for xbrl_core-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e588812517d3a14ae2b702ff23b235cfef3ff06178e78433821abc943008b88`
MD5	`8fbf11c0b0bec57fa0e4e3e2657b7572`
BLAKE2b-256	`cb9230fc7a8b0cb555ca8af6d1952a1161faf33a6092daa1dba8d15594041095`

See more details on using hashes here.

xbrl-core 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

xbrl-core — Lightweight XBRL 2.1 / iXBRL 1.1 Parser for Python

Installation

Quick Start

Parsing

XBRL Instance

iXBRL (Inline XBRL)

IXDS (Inline XBRL Document Set)

Strict / Lenient Mode

Context Structuring

Unit Structuring

Building LineItems

Linkbase Parsing

Presentation Linkbase

Calculation Linkbase

Definition Linkbase

Label Linkbase

Reference Linkbase

Footnotes

Schema Parsing

Calculation Validation

Text Block Extraction

DataFrame Conversion

Display

Rich (Terminal)

Hierarchical Display

HTML (Jupyter)

Label Resolution

Error Handling

Customizing Error / Warning Classes

Customizing Concept Extraction

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes