Skip to main content

Serialize data as a zip file of json and other formats

Project description

Downfile

Downfile can be used to serialize any data from Python in a controlled manner. The data is stored as a set of components in a ZIP file. The format of each component is some standard format, such as JSON (for dictionaries etc) or Feather (for Pandas DataFrames).

To serialize or deserialize new types, methods can be registered using setuptools entry_points.

Since a (de)serializer has to be written manually for each type, it does not have the same security and compatibility issues that Pickle has, but instead comes with a slightly higher development overhead.

Example usage:

>>> data = {"bar": pd.DataFrame({"foo": [1, 2, 3]}), "fie": "hello"}
>>> downfile.dump(data, "test.down")

>>> data2 = downfile.parse("test.down")
>>> data2
{'bar':    foo
 0    1
 1    2
 2    3,
 'fie': 'hello'}

Builtin support for datatypes

  • Python base types: int, float, bool, str, dict, list
  • Python builtin exceptions
  • Pandas DataFrames
  • Numpy arrays

How to add data types / formats

In setup.py in your own package (say mypippackage) add:

entry_points = {
    'downfile.dumpers': [
        'somepackage.somemodule.DataType=mypippackage.mymodule:dumper',
    ],
    'downfile.parsers': [
        'mypippackage.myformat=mypackage.mymodule:parser',
    ]}

then in mypackage.mymodule provide the following two methods

def dumper(file, obj):
    # Here `mypippackage.myformat` is the filename extension.
    # If the file format has a standard extension, such as `.png`, `.csv` etc,
    # you might want to use that here instead.
    name = file.new_file("mypippackage.myformat")
    with file.open_buffered(name, "w") as f:
        someFunctionToWriteObjToFile(f)
    # Here mypippackage.myformat is the key used to find `parser` in `setp.py` later.
    return {"__jsonclass__": ["mypippackage.myformat", [name]]}

def parser(file, obj):    
    name = obj["__jsonclass__"][1][0]
    with file.open_buffered(name, "r") as f:
        return someFunctionToReadObjFromFile(f)

mypippackage.myformat can be any string that is reasonably unique, typically the file extension used by the file format your're using for serialization. However, it is good practice to include the pip package name for your package, so that people can easily find out what packages are missing when failing to parse a file!

If you're familiar with JSON RPC class hinting, you're probably wondering if dumper really has to write a file, or if it could just return some JSONifyable data. And the answer is nope, it doesn't need to write a file. If you're curious about serializing small objects, check out the datetime handler.

To recursively encode some component value of the data you're encoding, you can use downfile.formats.format_json.to_json_string(downfile, v). This will encode the value v to a JSON string and return it. The returned JSON will use the same class hinting structure used in the main JSON file to serialize any complex type to external files.

Downfile instances

The file argument to dumper/parser above is an instance of downfile.Downfile, which is a subclass of zipfile.ZipFile that implements a few extra methods: new_file(extension) returns a new unique filename, open_buffered(filename, mode="r"|"w") works like open(), but uses a temporary file so that multiple files can be opened concurrently (zipfile.ZipFile.open() does not support this).

Data format details

  • A Downfile is a zip file
  • A Downfile must contain a JSON file named 0.json
    • This JSON file must contain an object with a key root
    • The content of the root key is considered the content of the entire Downfile.
  • Any file inside a Downfile can reference additional files inside the Downfile using relative paths
  • Any JSON file inside a Downfile can use JSON RPC 1.0 class hinting
  • A class hint of {"__jsonclass__": ["mypippackage.myformat", ["filename.ext"]]} must be used for data that is stored in a separate file inside the Downfile

Storage of datetimes and dates

  • Uses the class hints {"__jsonclass__":["datetime.datetime", ["%Y-%m-%d %H:%M:%S"]]} and {"__jsonclass__": ["datetime.date", ["%Y-%m-%d"]]}
  • Does not store any external file

Storage of exceptions

  • Uses the class hint {"__jsonclass__":["exception"]}
  • args property of the class hint object holds exception arguments
  • type property of the class hint object holds a list of string names for all classes in the inheritance list of the exception, most specific first. The names are prefixed with their respective module name.
  • Does not store any external file

Storage of Pandas DataFrames

  • Uses the class hint {"__jsonclass__": ["feather", [name]]}
  • Stored as a Feather file
  • Any object column will have its cell values encoded as JSON with the same class hinting used for the main JSON file.
  • To allow for more complex columans and indices (e.g. multilevel, numeric columns etc) not supported by the feather format, columns and index can optionally be converted to dataframes and stored separately (using the same method)
    • The index is stored in a property index on the class hint object, and its name in a property index_name
    • The columns are stored in a property columns on the class hint object, and its name in a property columns_name

Storage of Numpy arrays

  • Uses the class hint {"__jsonclass__": ["npy", [name]]}
  • Stored as an NPY file
  • If dtype is Object, values are encoded as JSON with the same class hinting used for the main JSON file, meaning that the pickle based encoder of Numpy is never triggered.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

downfile-0.0.5.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

downfile-0.0.5-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file downfile-0.0.5.tar.gz.

File metadata

  • Download URL: downfile-0.0.5.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for downfile-0.0.5.tar.gz
Algorithm Hash digest
SHA256 886e3c0f37ec2aafb64e2a9c49738802db626d97fd8b587987d2edc9f2539a46
MD5 fe6d69425a5e4db9cf22c6f500ccba6b
BLAKE2b-256 0c28d6bf450d9fa378749bc54be4f4998c690ad622246ac03750b690d68735d3

See more details on using hashes here.

File details

Details for the file downfile-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: downfile-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for downfile-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a1544d1a74c4ddb0c27aeb0df6fd7691700c515c1b9ae0fab42448baf6b18fa6
MD5 0ac3554ad9402af472aa114685e6132c
BLAKE2b-256 137013bd609b824006781add9d85011f932b526ce117561c25ce9ff436c7b228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page