Project description

ibson

BSON (Binary JSON) parsing library.

Usage

This library is designed to implement a basic BSON library with an interface that is similar to python's native JSON parsing library. In particular, this has expected usage:

import ibson


obj = {
    "a": {
        "b": [1, 2, 3],
        "uuid": uuid.uuid1()
    }
}

buffer = ibson.dumps(obj)
new_obj = ibson.loads(buffer)

# Evaluates as 'True'
new_obj == obj

This mimics the existing bson library for python, but also permits reading from and writing to (seekable) streams and files as well:

with open('file.bson', 'wb') as stm:
    ibson.dump(obj, stm)

# Elsewhere
with open('file.bson', 'rb') as stm:
    new_obj = ibson.load(stm)

# Should evaluate True
new_obj == obj

NOTE: It is important that the file is opened in binary mode, not text mode!

Under the hood, this library is designed in a similar manner as a SAX-style event-driven parser; it avoids explicit recursion wherever possible and has calls that permit iterating over the contents using generators with an interface that can even permit skipping keys/fields altogether. Since the parsing stack is maintained separately, it can also be used to verify and attempt to fix some issues.

How It Works

This library works by noting that the byte offset needed in a few places to (de)serialize BSON is already implicitly tracked in seekable streams via the call to: fp.tell(), omitting the need to track the byte counts directly. In places where these byte counts are not directly accessible, the caller is likely already loading the content into a bytearray or binary stream that can become seekable anyway. When this field is needed before the value is actually available (i.e. the length of a document before the document is written), this simply registers the position the length needs to be written, writes out a placeholder value (0), then retroactively writes out these lengths when they finally are known, hence the need for the writable stream to also be seekable. (As a slight optimization, these lengths are sorted and written from the start to the end of the file again when the encoder is done to effectively make to sequential passes instead of an arbitrary number of random-access passes.)

This library also strives to reduce memory-consumption as best as reasonable with an iterative parser, intentionally avoiding recursion where possible; the parser tracks the stack on the heap and also stores various fields internally so as to avoid loading everything parsed into memory when just traversing the document, in a manner analogous to SAX-style parsers for XML. (When decoding and storing the document as a python dict, yes, that will be in memory.)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language

Release history Release notifications | RSS feed

This version

0.0.3

Mar 19, 2022

0.0.2

Jan 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibson-0.0.3.tar.gz (19.1 kB view hashes)

Uploaded Mar 19, 2022 Source

Built Distribution

ibson-0.0.3-py3-none-any.whl (22.9 kB view hashes)

Uploaded Mar 19, 2022 Python 3

Hashes for ibson-0.0.3.tar.gz

Hashes for ibson-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ebaed828104d69b3e418c200fb9a01814a05705342e7895cbbc5afc3d2865c22`
MD5	`3d07d8bc465e9a848429400ba7ddf1ea`
BLAKE2b-256	`4666019ee3ca82fa4aad8c30f9b9be0551dea560a918c8dc4c56f4ca00126aa9`

Hashes for ibson-0.0.3-py3-none-any.whl

Hashes for ibson-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c457c2855f24685232deabab3375f12c5414428d637b4e24fe75d0365a8f437e`
MD5	`7fbe72618d5b33aa063d91c696ca49f5`
BLAKE2b-256	`4ce94d6c56f642a51d4488552f480f1bbe4248e75d544d8740d6ccef493cac39`