Skip to main content

Python binding port for JSONSki

Project description

GitHub Static Badge

JSONSki_python

JSONSki_python is the Python binding port for JSONSki

JSONSki is a streaming JSONPath processor with fast-forward functionality. During the streaming, it can automatically fast-forward over certain JSON substructures that are irrelavent to the query evaluation, without parsing them in detail. To make the fast-forward efficient, JSONSki features a highly bit-parallel solution that intensively utilizes bitwise and SIMD operations that are prevelent on modern CPUs to implement the fast-forward APIs.

GITHUB

You can check out the github from here - https://github.com/AutomataLab/JSONSki_python

Installation

pip install JSONSki

Quick Start

import jsonski as jski
print(jski.loadSingleRecord("$.features[150].actor.login", "datasets/test.json"))
  • We interface the following method:
jski.loadSingleRecord(args1, args2)    //args1 - String(query) and args2 - String(file_location)

Requirements

Hardware requirements

  • CPUs: 64-bit ALU instructions, 256-bit SIMD instruction set, and the carry-less multiplication instruction (pclmulqdq)
  • Operating System: Linux, MacOs (Intel Chips only)
  • C++ Compiler: g++ (7.4.0 or higher)

Software requirements

Before starting to use JSONSki-API you need to assure you have the following prerequisites:

Getting Started with Querying using JSONSki

JSONPath

JSONPath is the basic query language of JSON data. It refers to substructures of JSON data in a similar way as XPath queries are used for XML data. For the details of JSONPath syntax, please refer to Stefan Goessner's article.

JSONSki Queries Operators

Operator Description
$ root object
. child object
[] child array
* wildcard, all objects or array members
[index] array index
[start:end] array slice operator

Path Examples

Consider a piece of geo-referenced tweet in JSON

{
    "coordinates": [
        40.74118764, -73.9998279
    ],
    "user": {
        "id": 6253282
    },
    "place": {
        "name": "Manhattan",
        "bounding_box": {
            "type": "Ploygon",
            "pos": [
                [-74.026675, 40.683935],
                ......
            ]
        }
    }
}
JsonPath Result
$.coordinates[*] all coordinates
$.place.name place name
$.place.bounding_box.pos[0] first position of the bounding box in place
$.place.bounding_box.pos[0:2] first two positions of the bounding box in place

APIs

Records Loading

JSONSki API: Simplifying Data Handling

JSONski is a powerful and user-friendly API designed to streamline data handling and processing tasks, particularly when dealing with JSON-based data. It offers a range of functions to manipulate and manage data, making complex operations effortless. Two key functions within the JSONski API are jski.loadSingleRecord and jski.loadRecords:

  • loadSingleRecord(args1, args2) //args1 - String(query) and args2 - String(file_location): loads the whole input file as one single record (allow newlines in strings and other legal places).

  • loadRecords(args1, args2) //args1 - String(query) and args2 - String(file_location): loads multiple records from the input file (all newlines are treated as delimiters; no newlines (except for \n and \r in JSON strings) are allowed within a record);.

Performance Comparison with Python Parsing

Below is an example usage of JSONSki pip package.

#JSONSki
import jsonski as jski
import time

start_time = time.time()
print(jski.loadSingleRecord("$[*].entities.urls[*].url","./JSONSki/dataset/twitter_sample_large_record.json"))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed jsonski time:", elapsed_time, "seconds")


#Python`s inbuilt JSON parser
import json as j

start_time = time.time()
def parse_json_file(file_path):
    with open(file_path, 'r') as file:
        json_data = j.load(file)
        return json_data
json_file_path = './JSONSki/dataset/twitter_sample_large_record.json'
print(parse_json_file(json_file_path))
end_time = time.time()
elapsed_time = end_time - start_time
print("Elapsed default_python_json time:", elapsed_time, "seconds")
  • Note: The code snippet above benchmarks performance for JSONSki parsing VS Python in-built parsing.

Publication

[1] Lin Jiang and Zhijia Zhao. JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022.

@inproceedings{jsonski,
  title={JSONSki: Streaming Semi-structured Data with Bit-Parallel Fast-Forwarding},
  author={Lin Jiang and Zhijia Zhao},
  booktitle={Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  year={2022}
}

Performance

image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JSONSki-0.0.31.tar.gz (237.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

JSONSki-0.0.31-cp39-cp39-macosx_13_0_universal2.whl (73.2 kB view details)

Uploaded CPython 3.9macOS 13.0+ universal2 (ARM64, x86-64)

File details

Details for the file JSONSki-0.0.31.tar.gz.

File metadata

  • Download URL: JSONSki-0.0.31.tar.gz
  • Upload date:
  • Size: 237.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for JSONSki-0.0.31.tar.gz
Algorithm Hash digest
SHA256 1363772b23179d4fcd3a8653985721eac969adda9ea02351eba8564e435969f9
MD5 e6e7b4ca0eb8ffb9aa904af1a3061c19
BLAKE2b-256 3429e57b812ddd1ad1d6326e8aff1f24d3bff0cee24c60c1b33a6ed69b38e599

See more details on using hashes here.

File details

Details for the file JSONSki-0.0.31-cp39-cp39-macosx_13_0_universal2.whl.

File metadata

File hashes

Hashes for JSONSki-0.0.31-cp39-cp39-macosx_13_0_universal2.whl
Algorithm Hash digest
SHA256 d19c229ac06c465d3223662bc98e6bb0c420df9115c326a2d54cd6937dff94d9
MD5 c761ee9f490ca7e628417d8a9963cdb3
BLAKE2b-256 df4d5fdfca2c542d965482c121ece23db679a32ff672283e9c5db7f61b1a17ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page