Skip to main content

A table dataformat based on json

Project description

Build and Test Status

This package is tested using pytest on travis-ci. The current build-status is:

https://travis-ci.org/daknuett/ljson.svg?branch=master

The code is reviewed automatically on codacy:

https://api.codacy.com/project/badge/Grade/530345cc30dc44539e921eb63be461dd https://api.codacy.com/project/badge/Coverage/530345cc30dc44539e921eb63be461dd

What is ljson?

ljson is an attempt to create a database model suiting the needs of modern data processing. It is designed to work faster than usual json, but to keep the simple but yet elegant object representation.

ljson can be used instead of pure json to increase the performance when accessing a large set of data.

Why ljson?

There are a lot data storage formats out there: XML, JSON, CSV, SQL, NOSQL, binary packed, GNU-DB,…

Some of them are designed to store complete databases (SQL, NOSQL, …) and some are designed to store tables. And of course there are JSON and XML. They can be used to store more complex objects, are human-readable and data is stored in just one file.

But they suffer from one problem: If one wants to alter the data in the file he has to read the complete file and store all the data in his RAM. This is slow, maybe impossible (Big Data) and insecure. If the process cannot complete the operation properly this might corrupt all data.

ljson tries to bypass this by using a mixture of CSV (line based) and JSON (object based):

Every line is one object. If one wants to add another object he just opens the file in append mode and adds one line. If one line is corrupted the rest of the file is still valid.

Operating on large sets of objects is also possible by reading the file line by line.

Especially asynchronous operations can be performed easily, as the main part of the file stays untouched (unless you alter objects. Then the file has to be re-written).

Design

ljson is designed to be stored in files, the definition of a ljson file is:

<ljson_file> = [<header>\n]<ljson_content>
<ljson_content> = <json_object>{\n<json_object>}

<json_object> can be any json object, as described on json.org.

Datatypes

If you use ljson you are restricted to the following python data types (and their ljson types):

  • int: "int"

  • str: "str"

  • bool: "bool"

  • float: "float"

  • bytes: "bytes"

  • dict: "json"

  • list: "json"

Because it is possible to convert all data types to one of these it is possible to store any kind of data.

Usage

Without a Python Module

ljson is designed to work without any third party python modules. One can read ljson data with the python built-in json module:

>>> import json
>>> ljson = '{"id": 1, "name": "foo"}\n{"id": 2, "name": "bar"}'
>>> for line in ljson.split("\n"):
...     print(json.loads(line))
...
{'name': 'foo', 'id': 1}
{'name': 'bar', 'id': 2}

And this should always be the preferred way to access ljson data, if all data is required.

If one wants to access specific fields it is better to use the ljson python module:

With the ljson Module

Using the ljson Module is simple and efficient if one wants to access just some fields, not the complete file.

There are two base implementations: ljson.base.mem that loads the file content into the RAM. This is way faster and supports files without a header and one is able to construct the Table without a file.

The second implementation is ljson.base.disk. This implementation does not load any data into RAM. If you are accessing huge sets you should use this implementation.

Creating a table is simple (at least for the memory tables):

>>> import ljson
>>> header = ljson.Header({"id": {"type": "int", "modifiers":["unique"]}, "name": {"type": "str", "modifiers": []}})
>>> table = ljson.Table(header, [{"id": 1, "name": "foo"}, {"id": 2, "name": "bar"}, {"id": 3, "name": "bar"}])

One can access items using python’s built-in __getitem__ and __setitem__:

>>> table[{"id": 1}]["name"]
['foo']
>>> list(table[{"id": 1}])
[{'name': 'foo', 'id': 1}]

The table “index” must be a dict. This allows to access non-unique rows, like this:

>>> list(table[{"name":"bar"}])
[{'id': 2, 'name': 'bar'}, {'id': 3, 'name': 'bar'}]

Using ljson to store data

Using ljson to store data is pretty simple:

>>> from io import StringIO
>>> fout = StringIO()
>>> table.save(fout)
>>> fout.seek(0)
0
>>> fout.read()
'{"name": {"type": "str", "modifiers": []}, "__type__": "header", "id": {"type": "int", "modifiers": ["unique"]}}\n{"name": "foo", "id": 1}\n{"name": "bar", "id": 2}\n{"name": "bar", "id": 3}'
>>> fout.seek(0)
0
>>> table2 = ljson.Table.from_file(fout)
>>> list(table2)
[{'id': 1, 'name': 'foo'}, {'id': 2, 'name': 'bar'}, {'id': 3, 'name': 'bar'}]

Reading and writing csv files is pretty simple, too:

>>> from ljson.convert.csv import table2csv, csv2table
>>> fout = StringIO()
>>> table2csv(table, fout)
>>> fout.seek(0)
0
>>> fout.read()
'id,name\r\n1,foo\r\n2,bar\r\n3,bar\r\n'
>>> fout.seek(0)
0
>>> table2 = csv2table(fout, types = {"id": "int", "name":"str"})
>>> list(table2)
[{'id': 1, 'name': 'foo'}, {'id': 2, 'name': 'bar'}, {'id': 3, 'name': 'bar'}]

Todos

  • store bytes as b64

  • fix the sql bytes representation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ljson-0.5.4.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

ljson-0.5.4-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file ljson-0.5.4.tar.gz.

File metadata

  • Download URL: ljson-0.5.4.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.18.4 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.5.3

File hashes

Hashes for ljson-0.5.4.tar.gz
Algorithm Hash digest
SHA256 9ab6a2873ad766c8a01bb34870abaede24bbcafd924bd3eec619673ef229ccca
MD5 ddc344bb0ce3cbe770bb7f59fa2e4fd3
BLAKE2b-256 de69841a741dd0ee55076046c26dd2c63eee47f2c47897f4ce07dd875ed8f7ac

See more details on using hashes here.

File details

Details for the file ljson-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: ljson-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.18.4 setuptools/38.2.4 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.5.3

File hashes

Hashes for ljson-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f2a68507bf5d51a45f50dd130f6fc5d054758220ac4bd9b9d9e501432d59fd75
MD5 4b948e421dd36754e5ad7f2ca4353ede
BLAKE2b-256 671c584a2e315d025d82244edb6cfa1796f69a27722c51006e55973bbd638939

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page