Skip to main content

No project description provided

Project description

Continuous Integration

JSkiner

The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).

Installation

pip install jskiner

Usage

Checking the Json Schema of a Large .jsonl file

jskiner \
    --in <path_to_jsonl> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --split <number_of_split_batch_size>
    --split-path <path_to_store_the_split_files>

Checking the Json Schema for a folder of json files

jskiner \
    --in <path_to_jsons> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --batch-size <batch_size_for_inferencing>
    --cuckoo-path <path_to_store_the_cuckoo_filter>
    --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
    --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>

Infering the Schema in Python

from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema

Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})

Calculate the Union of a List of Schema

from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema

Optional(Atomic(Int()))

Using | Operation between Two Schema

from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema

Optional(Atomic(Int()))

TODO:

  • Enable inference from a folder of json files
  • Enable ignoring of existing json files using cuckoo filter
  • Enable add starting schema file
  • Enable batch-by-batch process on large jsonl file
  • FIX: make sure repr escape special characters.
  • Auto Formatting Using Black
  • Enable sampling of json files
  • Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
  • Fix: adding UnionRecord schema object
  • Enable direct inferencing from API online. (able to avoid repeat download of json)
  • Enable Regex to represent patterned FieldSet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (390.2 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl (399.9 kB view details)

Uploaded CPython 3.10 macOS 10.12+ x86-64

jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (391.1 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl (423.6 kB view details)

Uploaded CPython 3.9 macOS 10.12+ x86-64

jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (418.9 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl (423.7 kB view details)

Uploaded CPython 3.8 macOS 10.12+ x86-64

jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl (401.3 kB view details)

Uploaded CPython 3.7m macOS 10.12+ x86-64

File details

Details for the file jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0d5f5cb30b3c10641500e1aff6f920cd1deacb5c2a2db06fc9cde8bd1bf43d60
MD5 3b7c4a9ffbd05589a105c9f619285f23
BLAKE2b-256 da0946b47b5d8faa84d846431096c09a170485f6748369fdcdd752f3628c77f1

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1a482112ea462b3b803b946aee3291e346bbd7003d4a9ecc81bf473fc060da82
MD5 a8796a49c9d4e5b1985e6f096f1f3eef
BLAKE2b-256 60266945b4849adc209c58f47980491163f75ee46674f9be88edc3e89fcc57af

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a51fa54d4d0833769a694d52e8b81e6b90d33dd51d6c2ee5e80bc2c15cfaf236
MD5 9cc5bcc742135d08cc9dee9dfb85a65f
BLAKE2b-256 24bf59f39ea99762d6b9bb206751a940513f38434a128666c42c7567881cbced

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 abeb086eece2884ac6f0db7866aa769efc2f590a736e384e1f3626ff97a873d0
MD5 843ed9d2bf7de483e94809ba47497a27
BLAKE2b-256 46d143e652f84635dcb48bae62a7b1c470ee21727fb23fd518412636c1ed9150

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 32a33728096302b31a3aae5ef940dd0548319423722524ca4fb2d605ba87bb17
MD5 903ecd10f646facfff71eebdff514b9a
BLAKE2b-256 65507b7f46d284f9c9ececa46ce45e7b3387ee51b22d1b04364abd5c8057efd9

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 734a4b542db130dead020e0c842fe591337c50e4c8b66f2e8189f989d0d1dcce
MD5 ce111156785e66ee7b35621b37f50e72
BLAKE2b-256 0b52c409445980973c2877fbf8651d1b1e5f18193b9b7f7be93de898cc85cb4a

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f8d94a6e2662f44837c6c80fe0a3eaa169708bb92b810a2e7578743f32b4e3ab
MD5 9249c8e6e0aee77e7672a221db6f6d6e
BLAKE2b-256 bae934c6b1137c7286a786080f2fb3b13960bd58ccc4d315a749eb0a4970d9d8

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c8d867e68a716173702494e028006ef8162dcf1bba6bd0b48acd8a5e49a2fbc8
MD5 7106a7d75ce7f3fa10df4bb25292b78d
BLAKE2b-256 bfdad7621fe8da80c8069e6544aee7d0729ef4ce9fe8c1d76308f4e9c1afb22e

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b8eafd55b1b77b7c21bf3688ca41303a8d4d73e2d950e4bbbfc0a84b3f156568
MD5 486482a9a9f4bacdb3210f9e71f85ae1
BLAKE2b-256 a7eab04d303072e556781c4c05cd054e365d7af30c5a513905df0c32158ce9f1

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 40d3018adfc610ecb5aa67cdb9e9e77d64c11bbf954c411bee90d9e147a5c561
MD5 dc94087afe9ef9799808c382f71a8ddd
BLAKE2b-256 f2e7afffe3aa660f96ba1f855dd8d10131d8c9e069ec9145cd2b8e1c1e31e3ba

See more details on using hashes here.

File details

Details for the file jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c14bbb98e5cb3d58132b005c97cf78f3dd1840f5f86dd03bc03153552e1dac95
MD5 b414f159c05b0ceb1f4843be48a67ffa
BLAKE2b-256 f721b719d7fa8858be09d8a0f58649fbb80a09894580312792be6256ba7f4882

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page