Simple input/output using MessagePack and LZ4 for Python
Project description
python-inpout
Simple library for input/output using MessagePack and LZ4 compression in Python.
Installation
You can use pip
(or any PyPI-compatible package manager) for installation:
pip install inpout
or, if you prefer a local user installation:
pip install --user inpout
For Microsoft Windows users, you might need to run pip
through the Python interpreter:
python -m pip install inpout
Note: Visual C++ 14.0 is required for Windows installation. Get it with Microsoft Visual C++ Build Tools: https://visualstudio.microsoft.com/downloads/
Usage
To use the functionality of this library, simply import it in your Python programs:
import inpout
High-Level Functions
For saving/loading data using MessagePack and LZ4 compression, the following high-level convenience functions are provided in the root namespace:
-
load_obj(path, **kwargs)
: return a single object loaded from a file on disk.See
data_unpacker()
for details on the keyword arguments. -
load_iter(path, **kwargs)
: return an iterator of objects loaded from a file on disk.See
data_unpacker()
for details on the keyword arguments. -
save_obj(obj, path, **kwargs)
: save a single objectobj
to a file on disk.See
data_pack()
for details on the keyword arguments. -
save_iter(iterable, path, **kwargs)
: save an interable of objectsiterable
to a file on disk.See
data_pack()
for details on the keyword arguments.
Context Manager Functions
For more flexibility, the following context manager functions are provided in the root namespace:
-
data_unpacker(path, compression=True, **kwargs)
: create a data unpacker (MessagePack) context manager with optional compression (LZ4) support to be used as an iterable unpacker.path
: path to the file on disk containing the data to read.compression
: boolean flag for using LZ4 compression.kwargs
: keyword arguments passed directly to the MessagePack unpacker. See below.
-
data_pack(path, compression=True, level=None, append=False, **kwargs)
: create a data pack (MessagePack) context manager with optional compression (LZ4) support and file appending to be used as a packing function.path
: path to the file on disk that will contain the written data.compression
: boolean flag for using LZ4 compression.level
: the compression level for the LZ4 compressor. Seecompressor()
for details.append
: boolean flag for opening the file on disk in appending mode.kwargs
: keyword arguments passed directly to the MessagePack packer. See below.
Packing Functions
For packing/unpacking data with MessagePack directly without compression, the following functions are provided in inpout.packing
:
-
pack(obj, stream, **kwargs)
: pack a single object using MessagePack (with extended types support) to a stream of bytes.obj
: the object to pack.stream
: the bytes stream to use for writing data.kwargs
: keyword arguments passed directly to the MessagePack packer. See below.
-
packb(obj, **kwargs)
: pack a single object using MessagePack (with extended types support) and return packed bytes.obj
: the object to pack.kwargs
: keyword arguments passed directly to the MessagePack packer. See below.
-
unpack(stream, **kwargs)
: unpack a stream of packed bytes using MessagePack (with extended types support) and return a single unpacked object.stream
: the bytes stream to use for reading data.kwargs
: keyword arguments passed directly to the MessagePack unpacker. See below.
-
unpackb(packed, **kwargs)
: unpack packed bytes using MessagePack (with extended types support) and return a single unpacked object.packed
: the packed bytes to unpack.kwargs
: keyword arguments passed directly to the MessagePack unpacker. See below.
Compressing Functions
For compressing/decompressing arbitrary data with LZ4 directly without packing, the following context manager functions are provided in inpout.compression
:
-
decompressor(path)
: create a data decompressing context manager to be used as reader.path
: path to the file on disk containing the compressed data.
-
compressor(path, level=None, append=False)
: create a data compressing context manager to be used as a writer.path
: path to the file on disk that will contain the compressed data.level
: compression level to use. Defaults toLZ4F_COMPRESSION_MAX
ifNone
. Values lower than3
(including negative ones) use fast compression. Recommended range for hc-type compression is between4
and9
. More information can be found here.append
: boolean flag for opening the file on disk in appending mode.
Keyword Arguments for MessagePack
Functions involving data packing with MessagePack support optional keyword arguments kwargs
to be passed directly to MessagePack packer and unpacker. Useful options are described below:
-
use_list
: can beTrue
(default) orFalse
.List is the default sequence/array type for Python. But tuples are lighter than lists. You can use
use_list=False
while unpacking when performance is important for your program. Python objects that require hashable elements such asdict
orset
can't use lists as key, thereforeuse_list=False
is required for unpacking data containing tuples as keys.
Examples
Below is example code of how to use the main convenience functions of this library.
from datetime import datetime
import inpout
# create some Python objects to test, set and datetime are supported out of the box
obj1 = [1,2,3,4,5]
obj2 = ("test", 1234)
obj3 = {"test": 1234, "test2": 5678}
obj4 = {"a", "b", "c", 5, 6, 7, 8}
obj5 = datetime.now()
obj6 = {(1,2): "tuple_key"}
# save all the above objects as a single tuple to disk
inpout.save_obj((obj1, obj2, obj3, obj4, obj5), "test1.mp.lz4")
# save all the above objects in order to disk one by one (iterator)
iterator = (o for o in (obj1, obj2, obj3, obj4, obj5))
inpout.save_iter(iterator, "test2.mp.lz4")
# append more data to the same test file (save_obj and save_iter can be mixed)
inpout.save_obj(obj1, "test2.mp.lz4", append=True)
inpout.save_iter((obj2, obj3), "test2.mp.lz4", append=True)
# save an object with a tuple as key to demonstrate 'use_list=False'
inpout.save_obj(obj6, "test3.mp.lz4")
# load the first test file
data = inpout.load_obj("test1.mp.lz4")
print("DATA=%r" % (data,))
# load the second test file (iterator)
for obj in inpout.load_iter("test2.mp.lz4"):
print("OBJ=%r" % (obj,))
# load the third test file using tuple types, otherwise it fails
data = inpout.load_obj("test3.mp.lz4", use_list=False)
print("DATA=%r" % (data,))
# demonstrate the data pack function
with inpout.data_pack("test4.mp.lz4") as pack:
for obj in (obj1, obj2, obj3, obj4, obj5, obj6):
pack(obj)
# demonstrate the data unpacker function
with inpout.data_unpacker("test4.mp.lz4", use_list=False) as unpacker:
for obj in unpacker:
print("OBJ=%r" % (obj,))
# demonstrate the data pack function (no compression)
with inpout.data_pack("test4.mp", compression=False) as pack:
for obj in (obj1, obj2, obj3, obj4, obj5, obj6):
pack(obj)
# demonstrate the data unpacker function (no compression)
with inpout.data_unpacker("test4.mp", compression=False, use_list=False) as unpacker:
for obj in unpacker:
print("OBJ=%r" % (obj,))
MessagePack Extended Types
This library supports MessagePack extended types and includes encoders/decoders for two standard Python objects: set
(typecode 127
) and datetime
(typecode 126
). These are automatically registered upon importing the library.
set
objects are serialised as tuples containing their elements and reconstructed from these stored tuples.datetime
objects are serialised as a tuple of two integers(seconds, microseconds)
representing the number of seconds and microseconds since the UNIX epoch (00:00:00 Thursday, 1 January 1970). Timezone information is used for the conversion but not stored, thereforedatetime
objects are reconstructed as naive, i.e. without timezone.
You can also easily create your own encoders/decoders for Python objects and register them for this library to be used during serialisation/deserialisation:
import inpout
class MyType(object):
def __init__(self, data1, data2):
self.data1 = data1
self.data2 = data2
# define a representation for your type (encoder)
# we will assign '50' as the typecode for this type
def encode_mytype(obj, packb, ext_type):
return ext_type(50, packb((obj.data1, obj.data2)))
# define how to create your type from your representation (decoder)
def decode_mytype(data, unpackb):
data1, data2 = unpackb(data)
return MyType(data1, data2)
# register custom encoder/decoders for your type
inpout.packing.register_ext_type_encoder(MyType, encode_mytype)
inpout.packing.register_ext_type_decoder(50, decode_mytype)
# test saving/loading your type
obj = MyType("test", 1234)
inpout.save_obj(obj, "test.mp.lz4")
obj2 = inpout.load_obj("test.mp.lz4")
print(obj2.data1, obj2.data2)
You can use any typecode for your own extended types, however it must be between 0
and 125
(inclusive).
More information about MessagePack extended types can be found here.
Command-line Tools
The library includes the following command-line tools that are installed automatically by pip
:
-
inpout-pprint
: iterate and pretty-print data files generated by this library.This tool is based on the
load_iter
function with theuse_list=False
keyword argument. Compression is activated if filenames end with.lz4
(case insensitive). Optionally, theNUMBER
of objects to process from each input file can be also provided. Usage:$ inpout-pprint [-n NUMBER] FILENAME [FILENAME ...]
License
This software is under the Apache License 2.0.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for inpout-1.0.8-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef1e552e497f0c96a673bc68c404fc64fd02870e5c07a47b1d49df99d012b9a0 |
|
MD5 | 244c10284d81ac82361e06ae3f812cfd |
|
BLAKE2b-256 | c7538b48d7221f1d8ff48f86e64f3c40c89e69673bf0a70555278b219ed5ef4a |