No project description provided
Project description
JSkiner
The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).
Installation
pip install jskiner
Usage
Checking the Json Schema of a Large .jsonl file
jskiner \
--in <path_to_jsonl>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--split <number_of_split_batch_size>
--split-path <path_to_store_the_split_files>
Checking the Json Schema for a folder of json files
jskiner \
--in <path_to_jsons>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--batch-size <batch_size_for_inferencing>
--cuckoo-path <path_to_store_the_cuckoo_filter>
--cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
--cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>
Infering the Schema in Python
from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema
Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})
Calculate the Union of a List of Schema
from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non)])
schema
Optional(Atomic(Int()))
Using | Operation between Two Schema
from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema
Optional(Atomic(Int()))
TODO:
- Enable inference from a folder of json files
- Enable ignoring of existing json files using cuckoo filter
- Enable add starting schema file
- Enable batch-by-batch process on large jsonl file
- FIX: make sure repr escape special characters.
- Auto Formatting Using Black
- Enable Regex to represent patterned FieldSet
- Using borrow to increase efficiency
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for jskiner-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0b996a553333b53c564088fe12cb6d02eef91332c76e535ae867b1cbb7974f6 |
|
MD5 | 71a07a0b49e0ace13cb7f1cf4bfd4e64 |
|
BLAKE2b-256 | 42261e038aae706e3e6eaca586644e64c0437f5e7a04bb58b226417db4926722 |
Close
Hashes for jskiner-0.0.16-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa3c70c32eeb930de6d10af39ad59943e52410ab696d257cc9d6be881b7ad663 |
|
MD5 | 34510e58692f6a50143a8e0976eca741 |
|
BLAKE2b-256 | 82a9d74024bfc5135babb89d6cb2c2b20a3ea55fb877b2edfc73c2b1ec1c9e9f |
Close
Hashes for jskiner-0.0.16-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbacff1a43e4c08342039465f26ed6e30f23ad521aa2c4d26ee9708438eaf27e |
|
MD5 | e112003e24432caabe2ce1e5cf252fd3 |
|
BLAKE2b-256 | 574bd0018f013c1380dbc7f4a1707c849a377157418829a498662575f3f51d5d |
Close
Hashes for jskiner-0.0.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d88e6dbe956c934bc89532539e24c5d7d752b4e5c9dac549be8268504bbe08a1 |
|
MD5 | dc52ade11d9e6bcf128f579390008c22 |
|
BLAKE2b-256 | 9f09b2290a3d2529aa93bcf590cd2954582cd9874b238f177d40cebced23e60d |
Close
Hashes for jskiner-0.0.16-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2df8ec9126f24a92678a48fff8bda037ffb0e1d507114cd9eb184db4efefdcc |
|
MD5 | 786feeef4a588227a9e82a6e2b92a135 |
|
BLAKE2b-256 | 21ad854b4e07bc0e74c584882c1bfe76b2b9f2d9d4ed3486ffc709f9ac59e5a2 |
Close
Hashes for jskiner-0.0.16-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e378c49654e5ce9a336b9b5c0fce7ba199e7560a567c6134b2f4831fb01fe36c |
|
MD5 | 20e01b3201b85a71e83820f6c93d1fbd |
|
BLAKE2b-256 | 76734bbee090310a215ffae066dd9acf6dc59c3b2f27525d07d994379f71759b |
Close
Hashes for jskiner-0.0.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90f1ec34274a3425582c1d3bfeededae9f1b85bdbfefc6f8c492626eac393eeb |
|
MD5 | 1277ff725a6e5fe906ef6bc35410e2c0 |
|
BLAKE2b-256 | 29f11268465e07b2a9cab33ad7d7b49e97951b00fb491db1def68268238a3411 |
Close
Hashes for jskiner-0.0.16-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1bde379c0b0187ff63a6c71b3923eb6eed5a11382fb6d4cbd32be1b2034bf26 |
|
MD5 | 24aba6162a82cb3b0a7e38c07026bb22 |
|
BLAKE2b-256 | 737ecbf61a7cf0f365320f85f78c83f35efcb1ad70d0d44f620cbde1e31f487c |
Close
Hashes for jskiner-0.0.16-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 057bc6a3ef9a71ef67e6214d3c27da71c0d9c4153e335f7092e5b401abfbc650 |
|
MD5 | be3295d66733582476e66f0b4d0f022b |
|
BLAKE2b-256 | 1fdf96f57c11338aa10415a37169eb7c1780e5080384095679120695f0e70c76 |
Close
Hashes for jskiner-0.0.16-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f6d1609269e45f377cce426068008cbc33d62bd690f744a9ff3d1732734c602 |
|
MD5 | 49e6fdb8802d6c2614edea7b3446bc32 |
|
BLAKE2b-256 | 6e8da7d96e28b68a124546b904ff61614f780bc61661881978c51afc3ea950a7 |
Close
Hashes for jskiner-0.0.16-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f0df2870e19d5002a74fd0f441254acd8f79ec7287877a99a777964acb35eb9 |
|
MD5 | c910d456062d9ebb267de1ed99d868c3 |
|
BLAKE2b-256 | e4a597cd3d7d297b5cde0e6cf1176e911a0cf7a38427ce27df987ea35a5668e9 |