Skip to main content

Python wrappers around streamson

Project description

Code Quality Security audit

Python streamson

Python bindings for streamson. A memory efficient json splitter written in Rust. The project is still in an early phase, but it seems to be working

Installation

pip install streamson

Usage

Select users

>>> import streamson
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), matcher)
>>> for path, parsed in extracted:
...     path, parsed
...
('{"users"}[0]', 'john')
('{"users"}[1]', 'carl')
('{"users"}[2]', 'bob')

Select users and groups

>>> import streamson
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]']) | streamson.SimpleMatcher('{"groups"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), matcher)
>>> for path, parsed in extracted:
...     path, parsed
...
('{"users"}[0]', 'john')
('{"users"}[1]', 'carl')
('{"users"}[2]', 'bob')
('{"groups"}[0]', 'admins')
('{"groups"}[1]', 'staff')

Select only first level parts

>>> import streamson
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(1, 1)
>>> extracted = streamson.extract_iter((e for e in data), matcher)
>>> for path, parsed in extracted:
...     path, parsed
...
('{"users"}', ['john', 'carl', 'bob'])
('{"groups"}', ['admins', 'staff'])
('{"org"}', 'university')

Select second first level parts exclude first records

>>> import streamson
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(2, 2) & ~streamson.SimpleMatcher('{}[0]')
>>> extracted = streamson.extract_iter((e for e in data), matcher)
>>> for path, parsed in extracted:
...     path, parsed
...
('{"users"}[1]', 'carl')
('{"users"}[2]', 'bob')
('{"groups"}[1]', 'staff')

Motivation

This project is meant to be use as a fast json splitter. Its main purpose is to split raw binary data instead of parsing it. It is supposed to be fast and memory efficient.

Developer Docs

Build

Poetry is used to manage python dev-dependencies. After you install it you can run:

poetry install

This project also requires the nighly version of Rust. First you need to install rustup.

rustup install nightly

And you might need to set it as a default toolchain.

rustup default nightly

Precommit deployment

To pass the basic lints you may want to install pre-push hook to pre-commit to be sure that CI won't fail in the first step.

poetry run pre-commit install -t pre-push

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamson_python-2.0.0.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distributions

streamson_python-2.0.0-cp38-cp38-manylinux1_x86_64.whl (179.3 kB view hashes)

Uploaded CPython 3.8

streamson_python-2.0.0-cp37-cp37m-manylinux1_x86_64.whl (179.3 kB view hashes)

Uploaded CPython 3.7m

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page