Skip to main content

convert JSON data to space efficient format

Project description

compress-json-python

Store JSON data in space efficient manner.

PyPi Package Version

This library is optimized to compress json object in compact format, which can save network bandwidth and disk space.

Features

  • Supports all JSON types
  • Object key order is preserved
  • Repeated values are stored only once
  • Numbers are encoded in base62 format (0-9A-Za-z)
  • Support custom backend for memory store and cache

Multi Language Implementation

This package is a python implementation of compress-json. It is fully compatible with the npm package so the data compressed by either side can be decompressed by another side.

All Implementations

Installation

pip install compress-json-python

Usage Example

# Import functions from the Python package
from compress_json import compress, decompress

data = {
  'user': 'Alice',
  # more fields of any json values (string, number, array, object, e.t.c.)
}

compressed = compress(data) # the result is a list (array)

import requests
requests.post('https://example.com/submit', json=compressed) # used as json value

import json
with open("data.json", "w") as fd:
	fd.write(json.dumps(compressed)) # convert into string if needed

reversed = decompress(compressed)
data === reversed # will be true

Detail example can refer to the demo cli.py and tests in core_test.py

Compression Format

Sample data:

long_str = 'A very very long string, that is repeated'
data = {
  'int': 42,
  'float': 12.34,
  'str': 'Alice',
  'long_str': long_str,
  'longNum': 9876543210.123455,
  'bool': True,
  'bool2': False,
  'arr': [42, long_str],
  'arr2': [42, long_str], # identical values will be deduplidated, including array and object
  'obj': { # nested values are supported
    'id': 123,
    'name': 'Alice',
    'role': [ 'Admin', 'User', 'Guest' ],
    'long_str': 'A very very long string, that is repeated',
    'longNum': 9876543210.123455
  },
  'escape': [ 's|str', 'n|123', 'o|1', 'a|1', 'b|T', 'b|F' ]
}

Compressed data:

# [ encoded value array, root value index ]
compressed = [
  [  # encoded value array
    'int', # string
    'float',
    'str',
    'long_str',
    'longNum',
    'bool',
    'bool2',
    'arr',
    'arr2',
    'obj',
    'escape',
    'a|0|1|2|3|4|5|6|7|8|9|A',
    'n|g', # number (integer) (base62-encoded)
    'n|C.h', # number (float) (integer part and decimals are base62-encoded separately)
    'Alice',
    'A very very long string, that is repeated',
    'n|AmOy42.2KCf',
    'b|T', # boolean (True)
    'b|F', # boolean (False)
    'a|C|F', # array
    'id',
    'name',
    'role',
    'a|K|L|M|3|4',
    'n|1z',
    'Admin',
    'User',
    'Guest',
    'a|P|Q|R',
    'o|N|O|E|S|F|G', # object
    's|s|str', # escaped string
    's|n|123', # escaped number
    's|o|1',
    's|a|1',
    's|b|T', # escaped boolean
    's|b|F',
    'a|U|V|W|X|Y|Z',
    'o|B|C|D|E|F|G|H|I|J|J|T|a'
  ],
  'b' # root value index
]

Example structure for efficient compression

Original JSON data: (749 characters without white-spaces)

{
  "count": 5,
  "names": ["New York", "London", "Paris", "Beijing", "Moscow"],
  "cities": [
    {
      "id": 1,
      "name": "New York",
      "countryName": "USA",
      "location": { "latitude": 40.714606, "longitude": -74.0028 },
      "localityType": "BIG_CITY"
    },
    {
      "id": 2,
      "name": "London",
      "countryName": "UK",
      "location": { "latitude": 51.507351, "longitude": -0.127696 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 3,
      "name": "Paris",
      "countryName": "France",
      "location": { "latitude": 48.856663, "longitude": 2.351556 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 4,
      "name": "Beijing",
      "countryName": "China",
      "location": { "latitude": 39.90185, "longitude": 116.391441 },
      "localityType": "COUNTRY_CAPITAL"
    },
    {
      "id": 5,
      "name": "Moscow",
      "countryName": "Russia",
      "location": { "latitude": 55.755864, "longitude": 37.617698 },
      "localityType": "COUNTRY_CAPITAL"
    }
  ]
}

Compressed json: (562 characters without white-spaces)

[["count", "names", "cities", "a|0|1|2", "n|5", "New York", "London", "Paris", "Beijing", "Moscow", "a|5|6|7|8|9", "id", "name", "countryName", "location", "localityType", "a|B|C|D|E|F", "n|1", "USA", "latitude", "longitude", "a|J|K", "n|e.2Xkv", "n|-1C.28G", "o|L|M|N", "BIG_CITY", "o|G|H|5|I|O|P", "n|2", "UK", "n|p.dz7", "n|-0.2vFR", "o|L|T|U", "COUNTRY_CAPITAL", "o|G|R|6|S|V|W", "n|3", "France", "n|m.1XNq", "n|2.2kQz", "o|L|a|b", "o|G|Y|7|Z|c|W", "n|4", "China", "n|d.F7F", "n|1s.bVh", "o|L|g|h", "o|G|e|8|f|i|W", "Russia", "n|t.1xtN", "n|b.3lHA", "o|L|l|m", "o|G|4|9|k|n|W", "a|Q|X|d|j|o", "o|3|4|A|p"], "q"]

In this example, compression saves 25% of characters. However, the more complex and repetitive the structure, the more characters can be saved.

License

This project is licensed with BSD-2-Clause

This is free, libre, and open-source software. It comes down to four essential freedoms [ref]:

  • The freedom to run the program as you wish, for any purpose
  • The freedom to study how the program works, and change it so it does your computing as you wish
  • The freedom to redistribute copies so you can help others
  • The freedom to distribute copies of your modified versions to others

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compress-json-python-2.1.3.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

compress_json_python-2.1.3-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file compress-json-python-2.1.3.tar.gz.

File metadata

  • Download URL: compress-json-python-2.1.3.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for compress-json-python-2.1.3.tar.gz
Algorithm Hash digest
SHA256 62f41db2f1c8284fa86961800fc36cc21c52c2e4e1e351eb3cd54493bb17ed44
MD5 c0a4b5db7573f724a5a7bad52f4237f7
BLAKE2b-256 a4895c790d931ab3d70140c5217c0dcea286d41a786d1b98438f177bed432e1f

See more details on using hashes here.

File details

Details for the file compress_json_python-2.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for compress_json_python-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3530946552292ce6c98780855abce44691051185af6a4c8989f2dc58e0ae737b
MD5 3f233e9bb83e43ed915da2f067112923
BLAKE2b-256 26e50d0a5c12737bf901da6ccd63f237f02202acea8ff590122d0a6881921c5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page