Skip to main content

A python module for getting useful data out of ixbrl files.

Project description

ixbrl-parse

Test status PyPI version PyPI - Python Version PyPI - License

A python module for getting useful data out of ixbrl files. The library is at an early stage - feedback and improvements are very welcome.

New in version 0.5.4: Added backreferences to BeautifulSoup objects - thanks to @avyfain for PR.

New in version 0.5.3: Support for exclude and continuation elements within XBRL documents. Thanks to @wcollinscw for adding support for exclude elements.

New in version 0.5: Support for Python 3.11 has been added. I've had some problems with Python 3.11 and Windows as lxml binaries aren't yet available. Also new in version 0.5 is type checking - the whole library now has types added.

New in version 0.4: I've added initial support for pure XBRL files as well as tagged HTML iXBRL files. Feedback on this feature is welcome - particularly around getting values out of numeric items.

Requirements

The module requires BeautifulSoup and lxml to parse the documents.

word2number is used to process the numeric items with the numsenwords format.

How to install

You can install from pypi using pip:

pip install ixbrlparse

How to use

Run the python module

You can run the module directly to extract data from an IXBRL file.

python -m ixbrlparse example_file.html

The various options for using this can be found through:

python -m ixbrlparse -h
# optional arguments:
#   -h, --help            show this help message and exit
#   --outfile OUTFILE     Where to output the file
#   --format {csv,json,jsonlines,jsonl}
#                         format of the output
#   --fields {numeric,nonnumeric,all}
#                         Which fields to output

Use as a python module

An example of usage is shown in test.py.

Import the IXBRL class which parses the file.

from ixbrlparse import IXBRL

Initialise an object and parse the file

You need to pass a file handle or other object with a .read() method.

with open('sample_ixbrl.html', encoding="utf8") as a:
  x = IXBRL(a)

If your IXBRL data comes as a string then use a io.StringIO wrapper to pass it to the class:

import io
from ixbrlparse import IXBRL

content = '''<some ixbrl content>'''
x = IXBRL(io.StringIO(content))

Get the contexts and units used in the data

These are held in the object. The contexts are stored as a dictionary with the context id as the key, and a ixbrlContext object as the value.

print(x.contexts)
# {
#    "cfwd_2018_03_31": ixbrlContext(
#       id="cfwd_2018_03_31",
#       entity="0123456", # company number
#       segments=[], # used for hypercubes
#       instant="2018-03-31",
#       startdate=None, # used for periods
#       enddate=None, # used for periods
#    ),
#    ....
# }

The units are stored as key:value dictionary entries

print(x.units)
# {
#    "GBP": "ISO4107:GBP"
#    "shares": "shares"
# }

Get financial facts

Numeric facts are stored in x.numeric as a list of ixbrlNumeric objects. The ixbrlNumeric.value object contains the value as a parsed python number (after the sign and scale formatting values have been applied).

ixbrlNumeric.context holds the context object relating to this value. The .name and .schema values give the key of this value, according to the applied schema.

Non-numeric facts are stored in x.nonnumeric as a list of ixbrlNonnumeric objects, with similar .value, .context, .name and .schema values. The value of .value will be a string for non-numeric facts.

Check for any parsing errors

By default, the parser will throw an exception if it encounters an error when processing the document.

You can parse raise_on_error=False to the initial object to suppress these exceptions. You can then access a list of the errors (and the element) that created them through the .errors attribute. For example:

with open('sample_ixbrl.html', encoding="utf8") as a:
  x = IXBRL(a, raise_on_error=False)
  print(x.errors) # populated with any exceptions found
  # [ eg...
  #   {
  #     "error": <NotImplementedError>,
  #     "element": <BeautifulSoupElement>
  #   }
  # ]

Note that the error catching is only available for parsing of .nonnumeric and numeric items in the document. Any other errors with parsing will be thrown as normal no matter what raise_on_error is set to.

Code checks

Run tests

Tests can be run with pytest:

pip install -e . # install the package
pytest tests

Test coverage

coverage run -m pytest tests
coverage html
python -m http.server -d htmlcov

Run typing checks

mypy ixbrlparse tests

Linting

Black and isort should be run before committing any changes.

isort ixbrlparse tests
black ixbrlparse tests

Run all checks at once

black . && isort . && mypy ixbrlparse tests && coverage run -m pytest tests && coverage html --fail-under=100

Publish to pypi

python -m build
twine upload dist/*
git tag v<VERSION_NUMBER>
git push origin v<VERSION_NUMBER>

Install development version

The development requirements are installed using pip install -r dev-requirements.txt.

Any additional requirements for the module itself must be added to install_requires in setup.py. You should then generate a new requirements.txt using using pip-tools (pip-compile). You can then run pip-sync to install the requirement.

Any additional development requirements must be added to dev-requirements.in and then the dev-requirements.txt should be generated using pip-compile dev-requirements.in. You can then install the development requirements using pip-sync dev-requirements.txt.

Acknowledgements

Originally developed for a project with Power to Change looking at how to extract data from financial documents of community businesses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ixbrlparse-0.5.4.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

ixbrlparse-0.5.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file ixbrlparse-0.5.4.tar.gz.

File metadata

  • Download URL: ixbrlparse-0.5.4.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for ixbrlparse-0.5.4.tar.gz
Algorithm Hash digest
SHA256 92f87b4f968cd5f7935511159d6e5237dab62f6b451c0d5b9ca75bae4d0e91fd
MD5 029663a7172f69a8d05c7b09c9b892fa
BLAKE2b-256 a3d9cecb1344ecbcc19ed2b7599a139056ad952aa1001dee6e355c13f38c35be

See more details on using hashes here.

File details

Details for the file ixbrlparse-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: ixbrlparse-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for ixbrlparse-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5251e11f1fed1a40e99b6217d977e8e243b672ef4aeb30cda01e27d94774c199
MD5 9c453a8c86003b8b71b7be036c6f1aa7
BLAKE2b-256 2b29d834de428cb9075a2f5703652fdbda87f16aecebb0b951f12fe0d8cdaaee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page