No project description provided
Project description
JSkiner
The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).
Installation
pip install jskiner
Usage
Checking the Json Schema of a Large .jsonl file
jskiner \
--in <path_to_jsonl>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--split <number_of_split_batch_size>
--split-path <path_to_store_the_split_files>
Checking the Json Schema for a folder of json files
jskiner \
--in <path_to_jsons>
--verbose <false/true>
--out <output_file_path>
--nworkers <number_of_cpu_core>
--batch-size <batch_size_for_inferencing>
--cuckoo-path <path_to_store_the_cuckoo_filter>
--cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
--cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>
Infering the Schema in Python
from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema
Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})
Calculate the Union of a List of Schema
from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema
Optional(Atomic(Int()))
Using | Operation between Two Schema
from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema
Optional(Atomic(Int()))
TODO:
- Enable inference from a folder of json files
- Enable ignoring of existing json files using cuckoo filter
- Enable add starting schema file
- Enable batch-by-batch process on large jsonl file
- FIX: make sure repr escape special characters.
- Auto Formatting Using Black
- Enable sampling of json files
- Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
- Fix: adding UnionRecord schema object
- Enable direct inferencing from API online. (able to avoid repeat download of json)
- Enable Regex to represent patterned FieldSet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d5f5cb30b3c10641500e1aff6f920cd1deacb5c2a2db06fc9cde8bd1bf43d60 |
|
MD5 | 3b7c4a9ffbd05589a105c9f619285f23 |
|
BLAKE2b-256 | da0946b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1 |
File details
Details for the file jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 390.2 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a482112ea462b3b803b946aee3291e346bbd7003d4a9ecc81bf473fc060da82 |
|
MD5 | a8796a49c9d4e5b1985e6f096f1f3eef |
|
BLAKE2b-256 | 60266945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af |
File details
Details for the file jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
- Upload date:
- Size: 399.9 kB
- Tags: CPython 3.10, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a51fa54d4d0833769a694d52e8b81e6b90d33dd51d6c2ee5e80bc2c15cfaf236 |
|
MD5 | 9cc5bcc742135d08cc9dee9dfb85a65f |
|
BLAKE2b-256 | 24bf59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced |
File details
Details for the file jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abeb086eece2884ac6f0db7866aa769efc2f590a736e384e1f3626ff97a873d0 |
|
MD5 | 843ed9d2bf7de483e94809ba47497a27 |
|
BLAKE2b-256 | 46d143e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150 |
File details
Details for the file jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 391.1 kB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32a33728096302b31a3aae5ef940dd0548319423722524ca4fb2d605ba87bb17 |
|
MD5 | 903ecd10f646facfff71eebdff514b9a |
|
BLAKE2b-256 | 65507b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9 |
File details
Details for the file jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl
- Upload date:
- Size: 423.6 kB
- Tags: CPython 3.9, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 734a4b542db130dead020e0c842fe591337c50e4c8b66f2e8189f989d0d1dcce |
|
MD5 | ce111156785e66ee7b35621b37f50e72 |
|
BLAKE2b-256 | 0b52c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a |
File details
Details for the file jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8d94a6e2662f44837c6c80fe0a3eaa169708bb92b810a2e7578743f32b4e3ab |
|
MD5 | 9249c8e6e0aee77e7672a221db6f6d6e |
|
BLAKE2b-256 | bae934c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8 |
File details
Details for the file jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
- Upload date:
- Size: 418.9 kB
- Tags: CPython 3.8, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8d867e68a716173702494e028006ef8162dcf1bba6bd0b48acd8a5e49a2fbc8 |
|
MD5 | 7106a7d75ce7f3fa10df4bb25292b78d |
|
BLAKE2b-256 | bfdad7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e |
File details
Details for the file jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl
- Upload date:
- Size: 423.7 kB
- Tags: CPython 3.8, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8eafd55b1b77b7c21bf3688ca41303a8d4d73e2d950e4bbbfc0a84b3f156568 |
|
MD5 | 486482a9a9f4bacdb3210f9e71f85ae1 |
|
BLAKE2b-256 | a7eab04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1 |
File details
Details for the file jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40d3018adfc610ecb5aa67cdb9e9e77d64c11bbf954c411bee90d9e147a5c561 |
|
MD5 | dc94087afe9ef9799808c382f71a8ddd |
|
BLAKE2b-256 | f2e7afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba |
File details
Details for the file jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 401.3 kB
- Tags: CPython 3.7m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c14bbb98e5cb3d58132b005c97cf78f3dd1840f5f86dd03bc03153552e1dac95 |
|
MD5 | b414f159c05b0ceb1f4843be48a67ffa |
|
BLAKE2b-256 | f721b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882 |