Byte Stream Representation of Piecewise-constant Array
Project description
Purpose of PCA-B-Stream
In a Few Words
The PCA-B-Stream project allows to generate a printable byte stream representation of a piecewise-constant Numpy array and to re-create the array from the byte stream, similarly to what is available as part of the COCO API.
Illustration
From Python:
>>> import pca_b_stream as pcas
>>> import numpy as nmpy
>>> # --- Array creation
>>> array = nmpy.zeros((10, 10), dtype=nmpy.uint8)
>>> array[1, 1] = 1
>>> # --- Array -> Byte stream -> Array
>>> stream = pcas.PCA2BStream(array)
>>> decoding = pcas.BStream2PCA(stream)
>>> # --- Check and print
>>> assert nmpy.array_equal(decoding, array)
>>> print(stream)
b'FnmHoFain+3jtU'
From command line:
pca2bstream some_image_file # Prints the corresponding byte stream
bstream2pca a_byte_stream a_filename # Creates an image from the byte stream and stores it
Motivations
The motivations for developing an alternative to existing solutions are:
Arrays can be of any dimension (i.e., not just 2-dimensional)
Their dtype can be of kind boolean, integer, or float
They can contain more than 2 distinct values (i.e., non-binary arrays) as long as the values are integers (potentially stored in a floating-point format though)
The byte stream representation is self-contained; In particular, there is no need to keep track of the array shape externally
The byte stream representation contains everything needed to re-create the array exactly as it was instantiated (dtype, endianness, C or Fortran ordering); See note though
INSTALLATION
The PCA-B-Stream project is published on the Python Package Index (PyPI) at: https://pypi.org/project/pca-b-stream. It requires version 3.8, or newer, of the interpreter. It should be installable from Python distribution platforms or Integrated Development Environments (IDEs). Otherwise, it can be installed from a command-line console:
- For all users, after acquiring administrative rights:
First installation: pip3 install pca-b-stream
Installation update: pip3 install --upgrade pca-b-stream
- For the current user (no administrative rights required):
First installation: pip3 install --user pca-b-stream
Installation update: pip3 install --user --upgrade pca-b-stream
Documentation
Functions
The pca_b_stream module defines the following functions:
- PCA2BStream
Generates the byte stream representation of an array; Does not check the array validity (see PCAIsValid)
Input: a Numpy ndarray
Output: an object of type bytes
- BStream2PCA
Re-creates the array from its bytes stream representation; Does not check the stream format validity
Input/Output: input and output of PCA2BStream swapped
- PCAIsValid
Checks whether an array is a valid input for stream representation generation; It is meant to be used before calling PCA2BStream
Input: a Numpy ndarray
Output: a tuple (validity, issue) where validity is a boolean and issue is None if validity is True, or a string describing why the array is considered invalid otherwise.
Additional information about what are valid piecewise-constant arrays here is provided in the section “Motivations”.
- BStreamDetails
Extract details from a byte stream representation; See section “Byte Stream Format”
- Inputs:
a byte stream generated by PCA2BStream
- details: a string where each character corresponds to a detail to extract, or “+” to extract all of the available details; Default: “+”; Available details are:
m=maximum value in array (also number of sub-streams)
c=compression indicators (string of zeros and ones, one per sub-stream)
e=endianness
t=dtype type code
T=dtype name
o=enumeration order
v=first value (0 for 0 or False, 1 for non-zero or True)
d=array dimension
l=array lengths per dimension
should_print: a boolean to instruct whether the extracted details should be printed to console; Defaults: True
should_return: a boolean to instruct whether the extracted details should be returned (see Outputs); Defaults: False
- Output: either one of:
None if should_return is False
a dictionary of all of the available details if the details parameter is “+”
a tuple of the requested details in the same order as in the details parameter
Test Script
The test module defines a function Main allowing to check the validity of the encoding-decoding chain with hardcoded simple arrays and with general arrays created randomly in terms of dimension, size, contents, dtype… This function is made available by the installation process (see section “Installation”) as a command-line script test_pca_b_stream. It takes an optional integer argument setting the number of random arrays to test (defaults: 1000).
Byte Stream Format
A byte stream is a sequence of base85-encoded (sub-)streams joined with newlines characters b’n’.
For a boolean array or an array containing only 0’s (zeros) and 1’s (ones), there is only one such encoded stream. Once decoded, it has the following format (in lexicographical order; all characters are in bytes format):
0 or 1: indicates whether the remaining of the stream is in uncompressed or ZLIB compressed format; See note on compression; The remaining of the description applies to the stream in the uncompressed “space”
- 3 characters “{E}{T}{O}”:
E: endianness among “|”, “<” and “>”
T: dtype character code among: “?” + numpy.typecodes[“AllInteger”] + numpy.typecodes[“Float”]
O: enumeration order among “C” (C-ordering) and “F” (Fortran-ordering)
0 or 1: whether the first value in the array is zero (or False) or one (or True)
- characters resulting from the unsingned LEB128 encoding of some integers using the leb128 project; These integers are:
one integer for the dimension of the array (1 for vectors, 2 for matrices, 3 for volumes…)
one integer per dimension giving the length of the array in that dimension
integers of the run-length representation of the array read in its proper enumeration order
For arrays containing 3 distinct integer values or more (or if the maximum value is higher than 1 regardless of the number of distinct values), there is one encoded stream per value between 1 and the maximum value in the array. The first encoded stream format is identical to the binary case above. The format of the remaining streams is a version of the above format where information already known has been removed: the 3 characters “{E}{T}{O}” and the integers of the array dimension and the length per dimension.
Thanks
The project is developed with PyCharm Community.
The code is formatted by Black, The Uncompromising Code Formatter.
The imports are ordered by isort… your imports, so you don’t have to.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for pca_b_stream-2021.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85d7defdd0e2656922ab64e457cef0b58a9a24f157eb54b14b10172ce3c43915 |
|
MD5 | 69dc376d9ccbc5ea98f707e9826f2c78 |
|
BLAKE2b-256 | 2140f0d0c96853a273c3aeaa34c103502e48a18d90b40342807793d99df13f86 |