Universal errorsanderasures Reed Solomon codec (error correcting code) in pure Python with extensive documentation
Project description
UniReedSolomon is a purePython universal ReedSolomon error correction codec with fully documented code and mathematical nomenclatura, compatible with Python 2.7 up to 3.7 and also PyPy 2 and 3.
Compatible also with Python 3.8 but requires installing Cython even if not used (this will be fixed in the future).
Table of contents
Installation
pip install upgrade unireedsolomon
Quickstart
>>> import unireedsolomon as rs >>> coder = rs.RSCoder(20,13) >>> c = coder.encode("Hello, world!") >>> print repr(c) 'Hello, world!\x8d\x13\xf4\xf9C\x10\xe5' >>> >>> r = "\0"*3 + c[3:] >>> print repr(r) '\x00\x00\x00lo, world!\x8d\x13\xf4\xf9C\x10\xe5' >>> >>> coder.decode(r) 'Hello, world!'
Description
This library implements a purePython documented universal ReedSolomon error correction codec with a mathematical nomenclatura, compatible with Python 2.7 up to 3.7+ and also with PyPy 2 and PyPy 3.
The project aims to keep a well commented and organized code with an extensive documentation and mathematical clarity of the various arithmetic operations.
How does this library differs from other ReedSolomon libraries?
 universal: compatibility with (almost) any other ReedSolomon codec. This means that you can choose the parameters so that you can either encode data and decode it with another RS codec, or on the opposite encode data with another RS codec and decode this data with this library.
 errorsanderasures decoding allows to decode both erasures (where you know the position of the corrupted characters) and errors (where you don’t know where are the corrupted characters) at the same time (because you can decode more erasures than errors, so if you can provide even a few know corrupted characters positions, this can help a lot the decoder to repair the message).
 documented: following literate programming guidelines, you should understand everything you need about RS by reading the code and the comments.
 mathematical nomenclatura: for example, contrary to most other RS libraries, you will see a clear distinction between the different mathematical constructs, such as the Galois Fields numbers are clearly separated from the generic Polynomial objects, and both are separated from the ReedSolomon algorithm, which makes use of both of those constructs. For this purpose, objectoriented programming was chosen to design the architecture of the library, although obviously at the expense of a bit of performance. However, this library favors mathematical clarity and documentation over performance (even if performance is optimized whenever possible).
 purePython means that there are no dependencies whatsoever apart from the Python interpreter. This means that this library should be resilient in the future (since it doesn’t depend on external libraries who can become broken with time, see software rot), and you can use it on any system where Python can be installed (including online cloud services).
The authors tried their best to extensively document the algorithms. However, a lot of the math involved is nontrivial and we can’t explain it all in the comments. To learn more about the algorithms, see these resources:
 http://en.wikipedia.org/wiki/Reedâ€“Solomon_error_correction
 http://www.cs.duke.edu/courses/spring10/cps296.3/rs_scribe.pdf
 http://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs_scribe.pdf
 http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscicoguyb/realworld/www/reed_solomon.ps
 http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscicoguyb/realworld/www/rs_decode.ps
Also, here’s a copy of the presentation one of the authors gave to the class Spring 2010 on his experience implementing this library. The LaTeX source is in the presentation directory.
http://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs.pdf
The code was lately updated to support errorsanderasures decoding (both at the same time), and to be universal (you can supply the parameters to be compatible with almost any other RS codec).
The codec has decent performances if you use PyPy with the fast methods (~1 MB/s), but it would be faster if we drop the objectoriented design (implementing everything in functions), but this would be at the expense of mathematical clarity. If you are interested, see the reedsolo library by Tomer Filiba, which is exactly the same implementation but only functional without objects (results in about 5x speedup).
Files
 rs.py
 Holds the ReedSolomon Encoder/Decoder object
 polynomial.py
 Contains the Polynomial object (purepython)
 ff.py
 Contains the GF2int object representing an element of the GF(2^p) field, with p being 8 by default (purepython)
 polynomial.pyx
 Cython implementation of polynomial.py with equivalent functions (optional)
 ff.pyx
 Cython implementation of ff.py with equivalent functions (optional)
Documentation
 unireedsolomon.rs.RSCoder(n, k, generator=3, prim=0x11b, fcr=1, c_exp=8)
Creates a new ReedSolomon Encoder/Decoder object configured with the given n and k values. n is the length of a codeword, must be less than 256 k is the length of the message, must be less than n generator, prim and fcr parametrize the Galois Field that will be built c_exp is the Galois Field range (ie, 8 means GF(2^8) = GF(256)), which is both the limit for one symbol’s value, and the maximum length of a message+ecc.
The code will have error correcting power (ie, maximum number of repairable symbols) of 2*e+v <= (nk), where e is the number of errors and v the number of erasures.
The typical RSCoder is RSCoder(255, 223)
 RSCoder.encode(message, poly=False, k=None)
Encode a given string with reedsolomon encoding. Returns a byte string with the k message bytes and nk parity bytes at the end.
If a message is < k bytes long, it is assumed to be padded at the front with null bytes (ie, a shortened ReedSolomon code).
The sequence returned is always n bytes long.
If poly is not False, returns the encoded Polynomial object instead of the polynomial translated back to a string (useful for debugging)
You can change the length (number) of parity/ecc bytes at encoding by setting k to any value between [1, n1]. This allows to create only one RSCoder and then use it with a variable redundancy rate.
 RSCoder.encode_fast(message, poly=False, k=None)
 Same as encode() but using faster algorithms and optimization tricks.
 RSCoder.decode(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):
Given a received string or byte array message_ecc (composed of a message string + ecc symbols at the end), attempts to decode it. If it’s a valid codeword, or if there are no more than 2*e+v <= (nk) erratas (called the Singleton bound), the message is returned.
You can provide the erasures positions as a list to erasures_pos. For example, if you have “hella warld” and you know that a is an erasure, you can provide the list erasures_pos=[4, 7]. You can correct twice as many erasures than errors, and if some provided erasures are wrong (they are correct symbols), then there’s no problem, they will be repaired just fine (but will count towards the Singleton bound). You can also specify that you are sure there are only erasures and no errors at all by setting only_erasures=True.
A message always has k bytes, if a message contained less it is left padded with null bytes (punctured RS code). When decoded, these leading null bytes are stripped, but that can cause problems if decoding binary data. When nostrip is True, messages returned are always k bytes long. This is useful to make sure no data is lost when decoding binary data.
Note that RS can correct errors both in the message and the ecc symbols.
 RSCoder.decode_fast(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):
 Same as decode() but using faster algorithms and optimization tricks.
 RSCoder.check(message_ecc, k=None)
 Verifies the codeword (message + ecc symbols at the end) is valid by testing that the code as a polynomial code divides g, or that the syndrome is all 0 coefficients. The result is not foolproof: if it’s False, you’re sure the message was corrupted (or that you used the wrong RS parameters), but if it’s True, it’s either that the message is correct, or that there are too many errors (ie, more than the Singleton bound) for RS to do anything about it. returns True/False
 RSCoder.check_fast(message_ecc, k=None)
 Same as check() but using faster algorithms and optimization tricks.
 unireedsolomon.ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)
 Compute the list of prime polynomials for the given generator and galois field characteristic exponent. You can then use this prime polynomial to specify the mandatory “prim” parameter, particularly if you are using a larger Galois Field (eg, 2^16).
Internal API
Besides the main RSCoder object, two other objects are used in this implementation: Polynomial and GF2int. Their use is not specifically tied to the coder or even to the ReedSolomon algorithm, they are just generic mathematical constructs respectively representing polynomials and Galois field’s number of base 2.
You do not need to know about the internal API to use the RS codec, this is just left as a documentation for the reader interested into dwelling inside the mathematical constructs.
 polynomial.Polynomial(coefficients=[], **sparse)
There are three ways to initialize a Polynomial object. 1) With a list, tuple, or other iterable, creates a polynomial using the items as coefficients in order of decreasing power
2) With keyword arguments such as for example x3=5, sets the coefficient of x^3 to be 5
3) With no arguments, creates an empty polynomial, equivalent to Polynomial([0])
>>> print Polynomial([5, 0, 0, 0, 0, 0]) 5x^5
>>> print Polynomial(x32=5, x64=8) 8x^64 + 5x^32
>>> print Polynomial(x5=5, x9=4, x0=2) 4x^9 + 5x^5 + 2
Polynomial objects export the following standard functions that perform the expected operations using polynomial arithmetic. Arithmetic of the coefficients is determined by the type passed in, so integers or GF2int objects could be used, the Polynomial class is agnostic to the type of the coefficients.
__add__ __divmod__ __eq__ __floordiv__ __hash__ __len__ __mod__ __mul__ __ne__ __neg__ __sub__ evaluate(x) degree() Returns the degree of the polynomial get_coefficient(degree) Returns the coefficient of the specified term
 ff.GF2int(value)
Instances of this object are elements of the field GF(2^p) and instances are integers in the range 0 to (2^p)1. By default, the field is GF(2^8) and instances are integers in the range 0 to 255 and is defined using the irreducable polynomial 0x11b or in binary form: x^8 + x^4 + x^3 + x + 1 and using 3 as the generator for the exponent table and log table.
You can however use other parameters for the Galois Field, using the init_lut() function.
 ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)
 Find the list of prime polynomials to use to generate the lookup tables for your field.
 ff.init_lut(generator=3, prim=0x11b, c_exp=8)
 Generate the lookup tables given the parameters. This effectively parametrize your Galois Field (ie, generator=2, prim=0x1002d, c_exp=16) will generate a GF(2^16) field.
The GF2int class inherits from int and supports all the usual integer operations. The following methods are overridden for arithmetic in the finite field GF(2^p)
__add__ __div__ __mul__ __neg__ __pow__ __radd__ __rdiv__ __rmul__ __rsub__ __sub__ inverse() Multiplicative inverse in GF(2^p)
Example implementations
Image Encoder
imageencode.py is an example script that encodes codewords as rows in an image. It requires PIL to run.
Usage: python imageencode.py [d] <image file>
Without the d flag, imageencode.py will encode text from standard in and output it to the image file. With d, imageencode.py will read in the data from the image and output to standard out the decoded text.
An example is included: exampleimage.png. Try decoding it asis, then open it up in an image editor and paint some vertical stripes on it. As long as no more than 16 pixels per row are disturbed, the text will be decoded correctly. Then draw more stripes such that more than 16 pixels per row are disturbed and verify that the message is decoded improperly.
Notice how the parity data looks different–the last 32 pixels of each row are colored differently. That’s because this particular image contains encoded ASCII text, which generally only has bytes from a small range (the alphabet and printable punctuation). The parity data, however, is binary and contains bytes from the full range 0255. Also note that either the data area or the parity area (or both!) can be disturbed as long as no more than 16 bytes per row are disturbed.
Cython implementation
If either a C compiler or Cython is found, rs.py will automatically load the Cython implementations (the *.pyx files). These are provided as optimized versions of the purepython implementations, with equivalent functionalities. The goal was to get a speedup, which is the case, but using PyPy on the purepython implementation provides a significantly higher speedup than the Cython implementation. The Cython implementations are still provided for the interested reader, but the casual user is not advised to use them. If you want to encode and decode fast, use PyPy.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size unireedsolomon1.0.2cp37cp37mwin_amd64.whl (210.6 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size unireedsolomon1.0.2.tar.gz (295.9 kB)  File type Source  Python version None  Upload date  Hashes View 
Filename, size unireedsolomon1.0.2.winamd64py3.7.exe (814.2 kB)  File type Windows Installer  Python version 3.7  Upload date  Hashes View 
Hashes for unireedsolomon1.0.2cp37cp37mwin_amd64.whl
Algorithm  Hash digest  

SHA256  be422b7e8ed94e6420fc563788ea291d034475f2eafea947d524b85394bc904f 

MD5  c41f9f14d95eed84f0f1bf72c28cc7c4 

BLAKE2256  244a9aaa4115a7e3c9af509bc47321fb4569f450fdae1a7ca3f263846dd34e03 
Hashes for unireedsolomon1.0.2.winamd64py3.7.exe
Algorithm  Hash digest  

SHA256  c49d9a537332408ac516b344ced8e42c204a9df38072ccdb60a4fe01e1fff143 

MD5  c74eeadef96c5596c326ba49da1f7fa6 

BLAKE2256  1a770b25559e71f0a6cf871022a484ab278a313bab78d44f9de7fab8f822e8e6 