Python wrappers around streamson
Project description
Python streamson
Python bindings for streamson. A memory efficient json splitter written in Rust. The project is still in an early phase, but it seems to be working
Installation
pip install streamson
Usage
Select users
>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted).generator():
... path, data
...
('{"users"}[0]', b'"john"')
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')
Select users and groups
>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.SimpleMatcher('{"users"}[]']) | streamson.SimpleMatcher('{"groups"}[]'])
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
... path, data
...
('{"users"}[0]', b'"john"')
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')
('{"groups"}[0]', b'"admins"')
('{"groups"}[1]', b'"staff"')
Select only first level parts
>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(1, 1)
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
... path, data
...
('{"users"}', b'["john", "carl", "bob"]')
('{"groups"}', b'["admins", "staff"]')
('{"org"}', b'"university"')
Select second first level parts exclude first records
>>> import streamson
>>> import json
>>> data = [b'{"users": ["john","carl","bob"], "groups": ["admins", "staff"], "org": "university"}']
>>> matcher = streamson.DepthMatcher(2, 2) & ~streamson.SimpleMatcher('{}[0]')
>>> extracted = streamson.extract_iter((e for e in data), [(matcher, None)])
>>> for path, data in streamson.Output(extracted):
... path, data
...
('{"users"}[1]', b'"carl"')
('{"users"}[2]', b'"bob"')
('{"groups"}[1]', b'"staff"')
Motivation
This project is meant to be use as a fast json splitter. Its main purpose is to split raw binary data instead of parsing it. It is supposed to be fast and memory efficient.
Developer Docs
Build
Poetry is used to manage python dev-dependencies. After you install it you can run:
poetry install
This project also requires the nighly version of Rust. First you need to install rustup.
rustup install nightly
And you might need to set it as a default toolchain.
rustup default nightly
Precommit deployment
To pass the basic lints you may want to install pre-push hook to pre-commit to be sure that CI won't fail in the first step.
poetry run pre-commit install -t pre-push
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file streamson_python-4.0.0.tar.gz
.
File metadata
- Download URL: streamson_python-4.0.0.tar.gz
- Upload date:
- Size: 7.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42561a6fe630554116e0525fdbf6f06411d79280b112172a550e3290e1f5efe3 |
|
MD5 | a7908f20a6cac3c7df045890ca9e6304 |
|
BLAKE2b-256 | b94f827e579f7129378056be138071207ae96122767b3909317514fb3a64fbae |
File details
Details for the file streamson_python-4.0.0-cp39-cp39-manylinux2010_x86_64.whl
.
File metadata
- Download URL: streamson_python-4.0.0-cp39-cp39-manylinux2010_x86_64.whl
- Upload date:
- Size: 749.8 kB
- Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f22cfd41ea1f3c8710fcd301e3431dfd736df6113255c58f2ef1c56512f30062 |
|
MD5 | 5d7d14c513f42152e60d3f60997b4e29 |
|
BLAKE2b-256 | e7216b6342417fd5de503c436fc0c4e3ab67df8d732b98f4084a6f817b49e9fb |