Skip to main content

convert JSON data to space efficient format

Project description

compress-json-python

Store JSON data in space efficient manner.

PyPi Package Version

This library is optimized to compress json object in compact format, which can save network bandwidth and disk space.

Features

  • Supports all JSON types
  • Object key order is preserved
  • Repeated values are stored only once
  • Numbers are encoded in base62 format (0-9A-Za-z)
  • Support custom backend for memory store and cache

Multi Language Implementation

This package is a python implementation of compress-json. It is fully compatible with the npm package so the data compressed by either side can be decompressed by another side.

All Implementations

Installation

pip install compress-json-python

Usage Example

# Import functions from the Python package
from compress_json import compress, decompress

data = {
  'user': 'Alice',
  # more fields of any json values (string, number, array, object, e.t.c.)
}

compressed = compress(data) # the result is a list (array)

import requests
requests.post('https://example.com/submit', json=compressed) # used as json value

import json
with open("data.json", "w") as fd:
	fd.write(json.dumps(compressed)) # convert into string if needed

reversed = decompress(compressed)
data === reversed # will be true

Detail example can refer to the demo cli.py and tests in core_test.py

Compression Format

Sample data:

long_str = 'A very very long string, that is repeated'
data = {
  'int': 42,
  'float': 12.34,
  'str': 'Alice',
  'long_str': long_str,
  'longNum': 9876543210.123455,
  'bool': True,
  'bool2': False,
  'arr': [42, long_str],
  'arr2': [42, long_str], # identical values will be deduplidated, including array and object
  'obj': { # nested values are supported
    'id': 123,
    'name': 'Alice',
    'role': [ 'Admin', 'User', 'Guest' ],
    'long_str': 'A very very long string, that is repeated',
    'longNum': 9876543210.123455
  },
  'escape': [ 's|str', 'n|123', 'o|1', 'a|1', 'b|T', 'b|F' ]
}

Compressed data:

# [ encoded value array, root value index ]
compressed = [
  [  # encoded value array
    'int', # string
    'float',
    'str',
    'long_str',
    'longNum',
    'bool',
    'bool2',
    'arr',
    'arr2',
    'obj',
    'escape',
    'a|0|1|2|3|4|5|6|7|8|9|A',
    'n|g', # number (integer) (base62-encoded)
    'n|C.h', # number (float) (integer part and decimals are base62-encoded separately)
    'Alice',
    'A very very long string, that is repeated',
    'n|AmOy42.2KCf',
    'b|T', # boolean (True)
    'b|F', # boolean (False)
    'a|C|F', # array
    'id',
    'name',
    'role',
    'a|K|L|M|3|4',
    'n|1z',
    'Admin',
    'User',
    'Guest',
    'a|P|Q|R',
    'o|N|O|E|S|F|G', # object
    's|s|str', # escaped string
    's|n|123', # escaped number
    's|o|1',
    's|a|1',
    's|b|T', # escaped boolean
    's|b|F',
    'a|U|V|W|X|Y|Z',
    'o|B|C|D|E|F|G|H|I|J|J|T|a'
  ],
  'b' # root value index
]

Example structure for efficient compression

Original JSON data: (749 characters without white-spaces)

{
  "count": 5,
  "names": ["New York", "London", "Paris", "Beijing", "Moscow"],
  "cities": [
    {
      "id": 1,
      "name": "New York",
      "countryName": "USA",
      "location": { "latitude": 40.714606, "longitude": -74.0028 },
      "localityType": "BIG_CITY"
    },
    {
      "id": 2,
      "name": "London",
      "countryName": "UK",
      "location": { "latitude": 51.507351, "longitude": -0.127696 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 3,
      "name": "Paris",
      "countryName": "France",
      "location": { "latitude": 48.856663, "longitude": 2.351556 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 4,
      "name": "Beijing",
      "countryName": "China",
      "location": { "latitude": 39.90185, "longitude": 116.391441 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 5,
      "name": "Moscow",
      "countryName": "Russia",
      "location": { "latitude": 55.755864, "longitude": 37.617698 },
      "localityType": "COUNTRY_CAPITAL"
    }
  ]
}

Compressed json: (562 characters without white-spaces)

[["count", "names", "cities", "a|0|1|2", "n|5", "New York", "London", "Paris", "Beijing", "Moscow", "a|5|6|7|8|9", "id", "name", "countryName", "location", "localityType", "a|B|C|D|E|F", "n|1", "USA", "latitude", "longitude", "a|J|K", "n|e.2Xkv", "n|-1C.28G", "o|L|M|N", "BIG_CITY", "o|G|H|5|I|O|P", "n|2", "UK", "n|p.dz7", "n|-0.2vFR", "o|L|T|U", "COUNTRY_CAPITAL", "o|G|R|6|S|V|W", "n|3", "France", "n|m.1XNq", "n|2.2kQz", "o|L|a|b", "o|G|Y|7|Z|c|W", "n|4", "China", "n|d.F7F", "n|1s.bVh", "o|L|g|h", "o|G|e|8|f|i|W", "Russia", "n|t.1xtN", "n|b.3lHA", "o|L|l|m", "o|G|4|9|k|n|W", "a|Q|X|d|j|o", "o|3|4|A|p"], "q"]

In this example, compression saves 25% of characters. However, the more complex and repetitive the structure, the more characters can be saved.

License

This project is licensed with BSD-2-Clause

This is free, libre, and open-source software. It comes down to four essential freedoms [ref]:

  • The freedom to run the program as you wish, for any purpose
  • The freedom to study how the program works, and change it so it does your computing as you wish
  • The freedom to redistribute copies so you can help others
  • The freedom to distribute copies of your modified versions to others

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compress-json-python-3.0.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

compress_json_python-3.0.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file compress-json-python-3.0.0.tar.gz.

File metadata

  • Download URL: compress-json-python-3.0.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for compress-json-python-3.0.0.tar.gz
Algorithm Hash digest
SHA256 78c8740ab9174f93a95c20e71a4ed8d424917f9b2567c3bfcd86896fba48bbe8
MD5 5a775a6321f1c55cfbc6cbfefa781c05
BLAKE2b-256 129329df7bf79b8384520823a987e5f4e4d330393c1f250c5e51701d98fd1506

See more details on using hashes here.

File details

Details for the file compress_json_python-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for compress_json_python-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e55fe713439c9f351a53570481c32095ab9e3285cec29a7f67a557881f35474
MD5 94846120da6ef2710a7d1b17f631b680
BLAKE2b-256 57c240b825be74262c82152fdda0370acfbd36024effc1b07675a4592773c341

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page