Skip to main content

Byte Stream Representation of Piecewise-constant Array

Project description

Installation

This project is published on the Python Package Index (PyPI) at: https://pypi.org/project/pca-b-stream/. It should be installable from Python distribution platforms or Integrated Development Environments (IDEs). Otherwise, it can be installed from a command console:

  • For all users, after acquiring administrative rights:
    • First installation: pip install pca-b-stream

    • Installation update: pip install --upgrade pca-b-stream

  • For the current user (no administrative rights required):
    • First installation: pip install --user pca-b-stream

    • Installation update: pip install --user --upgrade pca-b-stream

Brief Description

In a Few Words

The PCA-B-Stream project allows to generate a printable byte stream representation of a piecewise-constant Numpy array and to re-create the array from the byte stream, similarly to what is available as part of the COCO API.

Illustration

In Python:

>>> import pca_b_stream as pcas
>>> import numpy as nmpy
>>> # --- Array creation
>>> array = nmpy.zeros((10, 10), dtype=nmpy.uint8)
>>> array[1, 1] = 1
>>> # --- Array -> Byte stream -> Array
>>> stream = pcas.PCA2BStream(array)
>>> decoding = pcas.BStream2PCA(stream)
>>> # --- Check and print
>>> assert nmpy.array_equal(decoding, array)
>>> print(stream)
b'FnmHoFain+3jtU'

From command line:

pca2bstream some_image_file           # Prints the corresponding byte stream
bstream2pca a_byte_stream a_filename  # Creates an image from the byte stream and stores it

Motivations

The motivations for developing an alternative to existing solutions are:

  • Arrays can be of any dimension (i.e., not just 2-dimensional)

  • Their dtype can be of boolean, integer, or float

  • They can contain more than 2 distinct values (i.e., non-binary arrays) as long as the values are integers (potentially stored in a floating-point format though)

  • The byte stream representation is self-contained; In particular, there is no need to keep track of the array shape externally

  • The byte stream representation contains everything needed to re-create the array exactly as it was instantiated (dtype, endianness, C or Fortran ordering); See note though

Documentation

Functions

The pca_b_stream module defines the following functions:

  • PCA2BStream
    • Generates the byte stream representation of an array; Does not check the array validity (see PCAIsValid)

    • Input: a Numpy ndarray

    • Output: an object of type bytes

  • BStream2PCA
    • Re-creates the array from its bytes stream representation; Does not check the stream format validity

    • Input/Output: input and output of PCA2BStream swapped

  • PCAIsValid
    • Checks whether an array is a valid input for stream representation generation; It is meant to be used before calling PCA2BStream

    • Input: a Numpy ndarray

    • Output: a tuple (validity, issue) where validity is a boolean and issue is None if validity is True, or a string describing why the array is considered invalid otherwise.

    • Additional information about what are valid piecewise-constant arrays here is provided in the section “Motivations”.

  • BStreamDetails
    • Extract details from a byte stream representation; See section “Byte Stream Format”

    • Inputs:
      • a byte stream generated by PCA2BStream

      • details: a string where each character corresponds to a detail to extract, or “+” to extract all of the available details; Default: “+”; Available details are:
        • m=maximum value in array (also number of sub-streams)

        • c=compression indicators (string of zeros and ones, one per sub-stream)

        • e=endianness

        • t=dtype type code

        • T=dtype name

        • o=enumeration order

        • v=first value (0 for 0 or False, 1 for non-zero or True)

        • d=array dimension

        • l=array lengths per dimension

      • should_print: a boolean to instruct whether the extracted details should be printed to console; Defaults: True

      • should_return: a boolean to instruct whether the extracted details should be returned (see Outputs); Defaults: False

    • Output: either one of:
      • None if should_return is False

      • a dictionary of all of the available details if the details parameter is “+”

      • a tuple of the requested details in the same order as in the details parameter

Command Line Scripts

The PCA-B-Stream project defines two command line scripts: pca2bstream and bstream2pca. The former takes a path to an image file as argument, and prints the corresponding byte stream (without the “b” string type prefix). The latter takes a character string and a filename as arguments, in that order, and creates an image file with this name that corresponds to the string interpreted as a byte stream. The file must not already exist.

Byte Stream Format

A byte stream is a sequence of base85-encoded (sub-)streams joined with newlines characters b’n’.

For a boolean array or an array containing only 0’s (zeros) and 1’s (ones), there is only one such encoded stream. Once decoded, it has the following format (in lexicographical order; all characters are in bytes format):

  • 0 or 1: indicates whether the remaining of the stream is in uncompressed or ZLIB compressed format; See note on compression; The remaining of the description applies to the stream in uncompressed format

  • 3 characters “{E}{T}{O}”:
    • E: endianness among “|”, “<” and “>”

    • T: dtype character code among: “?” + numpy.typecodes[“AllInteger”] + numpy.typecodes[“Float”]

    • O: enumeration order among “C” (C-ordering) and “F” (Fortran-ordering)

  • 0 or 1: whether the first value in the array is zero (or False) or one (or True)

  • characters resulting from the unsingned LEB128 encoding of some integers using the leb128 project; These integers are:
    • one integer for the dimension of the array (1 for vectors, 2 for matrices, 3 for volumes…)

    • one integer per dimension giving the length of the array in that dimension

    • integers of the run-length representation of the array read in its proper enumeration order

For arrays containing 3 distinct integer values or more (or if the maximum value is higher than 1 regardless of the number of distinct values), there is one encoded stream per value between 1 and the maximum value in the array. The first encoded stream format is identical to the binary case above. The format of the remaining streams is a version of the above format where information already known has been removed: the 3 characters “{E}{T}{O}”, the integers of the array dimension, and the length per dimension.

Acknowledgments

The project is developed with PyCharm Community.

The development relies on several open-source packages (see install_requires in setup.py, if present; otherwise import statements should be searched for).

The code is formatted by Black, The Uncompromising Code Formatter.

The imports are ordered by isortyour imports, so you don’t have to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pca-b-stream-2022.8.tar.gz (31.9 kB view hashes)

Uploaded Source

Built Distribution

pca_b_stream-2022.8-py3-none-any.whl (27.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page