Skip to main content

No project description provided

Project description

Continuous Integration

JSkiner

The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).

Installation

pip install jskiner

Usage

Checking the Json Schema of a Large .jsonl file

jskiner \
    --in <path_to_jsonl> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --split <number_of_split_batch_size>
    --split-path <path_to_store_the_split_files>

Checking the Json Schema for a folder of json files

jskiner \
    --in <path_to_jsons> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --batch-size <batch_size_for_inferencing>
    --cuckoo-path <path_to_store_the_cuckoo_filter>
    --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
    --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>

Infering the Schema in Python

from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema

Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})

Calculate the Union of a List of Schema

from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema

Optional(Atomic(Int()))

Using | Operation between Two Schema

from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema

Optional(Atomic(Int()))

TODO:

  • Enable inference from a folder of json files
  • Enable ignoring of existing json files using cuckoo filter
  • Enable add starting schema file
  • Enable batch-by-batch process on large jsonl file
  • FIX: make sure repr escape special characters.
  • Auto Formatting Using Black
  • Enable sampling of json files
  • Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
  • Fix: adding UnionRecord schema object
  • Enable direct inferencing from API online. (able to avoid repeat download of json)
  • Enable Regex to represent patterned FieldSet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

jskiner-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (390.2 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

jskiner-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl (399.9 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

jskiner-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (391.1 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

jskiner-0.1.1-cp39-cp39-macosx_10_12_x86_64.whl (423.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.12+ x86-64

jskiner-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp38-cp38-macosx_11_0_arm64.whl (418.9 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

jskiner-0.1.1-cp38-cp38-macosx_10_12_x86_64.whl (423.7 kB view hashes)

Uploaded CPython 3.8 macOS 10.12+ x86-64

jskiner-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

jskiner-0.1.1-cp37-cp37m-macosx_10_12_x86_64.whl (401.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page