A python module for getting useful data out of ixbrl files.
Project description
ixbrl-parse
A python module for getting useful data out of ixbrl files. The library is at an early stage - feedback and improvements are very welcome.
Changelog
New in version 0.6.0: Switch to use the hatch build and development system.
New in version 0.5.4: Added backreferences to BeautifulSoup objects - thanks to @avyfain for PR.
New in version 0.5.3: Support for exclude
and continuation
elements within XBRL documents. Thanks to @wcollinscw for adding support for exclude elements.
New in version 0.5: Support for Python 3.11 has been added. I've had some problems with Python 3.11 and Windows as lxml binaries aren't yet available. Also new in version 0.5 is type checking - the whole library now has types added.
New in version 0.4: I've added initial support for pure XBRL files as well as tagged HTML iXBRL files. Feedback on this feature is welcome - particularly around getting values out of numeric items.
Requirements
The module requires BeautifulSoup and lxml to parse the documents.
word2number is used to process the
numeric items with the numsenwords
format.
How to install
You can install from pypi using pip:
pip install ixbrlparse
How to use
Run the python module
You can run the module directly to extract data from an IXBRL file.
ixbrlparse example_file.html
# or
python -m ixbrlparse example_file.html
The various options for using this can be found through:
python -m ixbrlparse -h
# optional arguments:
# -h, --help show this help message and exit
# --outfile OUTFILE Where to output the file
# --format {csv,json,jsonlines,jsonl}
# format of the output
# --fields {numeric,nonnumeric,all}
# Which fields to output
Use as a python module
An example of usage is shown in test.py
.
Import the IXBRL
class which parses the file.
from ixbrlparse import IXBRL
Initialise an object and parse the file
You need to pass a file handle or other object with a .read()
method.
with open('sample_ixbrl.html', encoding="utf8") as a:
x = IXBRL(a)
If your IXBRL data comes as a string then use a io.StringIO
wrapper to
pass it to the class:
import io
from ixbrlparse import IXBRL
content = '''<some ixbrl content>'''
x = IXBRL(io.StringIO(content))
Get the contexts and units used in the data
These are held in the object. The contexts are stored as a dictionary with the context
id as the key, and a ixbrlContext
object as the value.
print(x.contexts)
# {
# "cfwd_2018_03_31": ixbrlContext(
# id="cfwd_2018_03_31",
# entity="0123456", # company number
# segments=[], # used for hypercubes
# instant="2018-03-31",
# startdate=None, # used for periods
# enddate=None, # used for periods
# ),
# ....
# }
The units are stored as key:value dictionary entries
print(x.units)
# {
# "GBP": "ISO4107:GBP"
# "shares": "shares"
# }
Get financial facts
Numeric facts are stored in x.numeric
as a list of ixbrlNumeric
objects.
The ixbrlNumeric.value
object contains the value as a parsed python number
(after the sign and scale formatting values have been applied).
ixbrlNumeric.context
holds the context object relating to this value.
The .name
and .schema
values give the key of this value, according to
the applied schema.
Non-numeric facts are stored in x.nonnumeric
as a list of ixbrlNonnumeric
objects, with similar .value
, .context
, .name
and .schema
values.
The value of .value
will be a string for non-numeric facts.
Check for any parsing errors
By default, the parser will throw an exception if it encounters an error when processing the document.
You can parse raise_on_error=False
to the initial object to suppress
these exceptions. You can then access a list of the errors (and the element)
that created them through the .errors
attribute. For example:
with open('sample_ixbrl.html', encoding="utf8") as a:
x = IXBRL(a, raise_on_error=False)
print(x.errors) # populated with any exceptions found
# [ eg...
# {
# "error": <NotImplementedError>,
# "element": <BeautifulSoupElement>
# }
# ]
Note that the error catching is only available for parsing of .nonnumeric
and numeric
items in the document. Any other errors with parsing will be
thrown as normal no matter what raise_on_error
is set to.
Code checks
The module is setup for development using hatch.
Run tests
Tests can be run with pytest
:
hatch run test
Test coverage
Run tests then report on coverage
hatch run cov
Run tests then run a server showing where coverage is missing
hatch run cov-html
Run typing checks
hatch run lint:typing
Linting
Black and ruff should be run before committing any changes.
To check for any changes needed:
hatch run lint:style
To run any autoformatting possible:
hatch run lint:fmt
Run all checks at once
hatch run lint:all
Publish to pypi
hatch build
hatch publish
git tag v<VERSION_NUMBER>
git push origin v<VERSION_NUMBER>
Acknowledgements
Originally developed for a project with Power to Change looking at how to extract data from financial documents of community businesses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ixbrlparse-0.6.0.tar.gz
.
File metadata
- Download URL: ixbrlparse-0.6.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f415414d25928a70a71a9f67da86f2b3609278ef23680430a6e2de0de04adbbd |
|
MD5 | 74f93fea4f26d66ed560c7ac7df10adb |
|
BLAKE2b-256 | 6f8d06f66628dabac0a692170d6fb4157beb1888dc2f374ec482201896e8d62f |
File details
Details for the file ixbrlparse-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: ixbrlparse-0.6.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.24.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5ff8ef869e981194a069a67dca42451a0ae19a79a2467139ca026f825c3c70f |
|
MD5 | a3d47b0ad52f08568eb884c41def7ca3 |
|
BLAKE2b-256 | 8f48cc4d2049e55b3a28433b869650139ac80083c62e73b81e672e9122d4c2d5 |