Skip to main content

Python wrappers around streamson

Project description

Code Quality Security audit

Python streamson

Python bindings for streamson. A memory efficient json splitter written in Rust. The project is still in an early phase, but it seems to be working

Installation

pip install streamson

Usage

Select users

>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted).generator():
...     path, data
...
('{"users"}[0]', b'"john"')
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')

Select users and groups

>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]']) | streamson.SimpleMatcher('{"groups"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
...     path, data
...
('{"users"}[0]', b'"john"')
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')
('{"groups"}[0]', b'"admins"')
('{"groups"}[1]', b'"staff"')

Select only first level parts

>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(1, 1)
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
...     path, data
...
('{"users"}', b'["john", "carl", "bob"]')
('{"groups"}', b'["admins", "staff"]')
('{"org"}', b'"university"')

Select second first level parts exclude first records

>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(2, 2) & ~streamson.SimpleMatcher('{}[0]')
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
...     path, data
...
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')
('{"groups"}[1]', b'"staff"')

Motivation

This project is meant to be use as a fast json splitter. Its main purpose is to split raw binary data instead of parsing it. It is supposed to be fast and memory efficient.

Developer Docs

Build

Poetry is used to manage python dev-dependencies. After you install it you can run:

poetry install

This project also requires the nighly version of Rust. First you need to install rustup.

rustup install nightly

And you might need to set it as a default toolchain.

rustup default nightly

Precommit deployment

To pass the basic lints you may want to install pre-push hook to pre-commit to be sure that CI won't fail in the first step.

poetry run pre-commit install -t pre-push

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamson_python-4.0.0.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

streamson_python-4.0.0-cp39-cp39-manylinux2010_x86_64.whl (749.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

File details

Details for the file streamson_python-4.0.0.tar.gz.

File metadata

  • Download URL: streamson_python-4.0.0.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.9.4

File hashes

Hashes for streamson_python-4.0.0.tar.gz
Algorithm Hash digest
SHA256 42561a6fe630554116e0525fdbf6f06411d79280b112172a550e3290e1f5efe3
MD5 a7908f20a6cac3c7df045890ca9e6304
BLAKE2b-256 b94f827e579f7129378056be138071207ae96122767b3909317514fb3a64fbae

See more details on using hashes here.

File details

Details for the file streamson_python-4.0.0-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for streamson_python-4.0.0-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f22cfd41ea1f3c8710fcd301e3431dfd736df6113255c58f2ef1c56512f30062
MD5 5d7d14c513f42152e60d3f60997b4e29
BLAKE2b-256 e7216b6342417fd5de503c436fc0c4e3ab67df8d732b98f4084a6f817b49e9fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page