Universal errorsanderasures Reed Solomon codec (error correcting code) in pure Python with extensive documentation and an objectoriented design.
Project description
UniReedSolomon is a purePython universal ReedSolomon error correction codec with fully documented code and mathematical nomenclatura, compatible with Python 3.7+ and also PyPy 3.
If you are just starting with ReedSolomon error correction codes, please see the Wikiversity tutorial.
Please also check out the more mature and faster Reedsolo module, a sibling project using a functional oriented approach instead of object oriented, and maintained by one of the coauthors of this project.
Installation
For Python 3.7+:
pip install upgrade unireedsolomon
For Python 2.7 and <= 3.6:
pip install upgrade unireedsolomon==1.0.5
To install and compile the Cython speedoptimized modules cff.pyx and cpolynomial.pyx, which provide a ~4x speed boost during encoding, install Cython 0.29.x and then type:
pip install upgrade unireedsolomon configsetting="buildoption=cythonize"
Quickstart
>>> import unireedsolomon as rs >>> coder = rs.RSCoder(20,13) >>> c = coder.encode("Hello, world!") >>> print repr(c) 'Hello, world!\x8d\x13\xf4\xf9C\x10\xe5' >>> >>> r = "\0"*3 + c[3:] >>> print repr(r) '\x00\x00\x00lo, world!\x8d\x13\xf4\xf9C\x10\xe5' >>> >>> coder.decode(r) 'Hello, world!'
Description
This library implements a purePython documented universal ReedSolomon error correction codec with a mathematical nomenclatura, compatible with Python 3.7+ and also with PyPy 3.
The project aims to keep a well commented and organized code with an extensive documentation and mathematical clarity of the various arithmetic operations.
How does this library differs from other ReedSolomon libraries?
universal: compatibility with (almost) any other ReedSolomon codec. This means that you can choose the parameters so that you can either encode data and decode it with another RS codec, or on the opposite encode data with another RS codec and decode this data with this library.
errorsanderasures decoding allows to decode both erasures (where you know the position of the corrupted characters) and errors (where you don’t know where are the corrupted characters) at the same time (because you can decode more erasures than errors, so if you can provide even a few know corrupted characters positions, this can help a lot the decoder to repair the message).
documented: following literate programming guidelines, you should understand everything you need about RS by reading the code and the comments.
mathematical nomenclatura: for example, contrary to most other RS libraries, you will see a clear distinction between the different mathematical constructs, such as the Galois Fields numbers are clearly separated from the generic Polynomial objects, and both are separated from the ReedSolomon algorithm, which makes use of both of those constructs. For this purpose, objectoriented programming was chosen to design the architecture of the library, although obviously at the expense of a bit of performance. However, this library favors mathematical clarity and documentation over performance (even if performance is optimized whenever possible).
purePython means that there are no dependencies whatsoever apart from the Python interpreter. This means that this library should be resilient in the future (since it doesn’t depend on external libraries who can become broken with time, see software rot), and you can use it on any system where Python can be installed (including online cloud services).
The authors tried their best to extensively document the algorithms. However, a lot of the math involved is nontrivial and we can’t explain it all in the comments. To learn more about the algorithms, see these resources:
http://www.cs.duke.edu/courses/spring10/cps296.3/rs_scribe.pdf
http://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs_scribe.pdf
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscicoguyb/realworld/www/reed_solomon.ps
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pscicoguyb/realworld/www/rs_decode.ps
Also, here’s a copy of the presentation one of the authors gave to the class Spring 2010 on his experience implementing this library. The LaTeX source is in the presentation directory.
http://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs.pdf
The code was lately updated to support errorsanderasures decoding (both at the same time), and to be universal (you can supply the parameters to be compatible with almost any other RS codec).
The codec has decent performances if you use PyPy with the fast methods (~1 MB/s), but it would be faster if we drop the objectoriented design (implementing everything in functions), but this would be at the expense of mathematical clarity. If you are interested, see the reedsolo library by Tomer Filiba, which is exactly the same implementation but only functional without objects (results in about 5x speedup).
Files
 rs.py
Holds the ReedSolomon Encoder/Decoder object
 polynomial.py
Contains the Polynomial object (purepython)
 ff.py
Contains the GF2int object representing an element of the GF(2^p) field, with p being 8 by default (purepython)
 polynomial.pyx
Cython implementation of polynomial.py with equivalent functions (optional)
 ff.pyx
Cython implementation of ff.py with equivalent functions (optional)
Documentation
 unireedsolomon.rs.RSCoder(n, k, generator=3, prim=0x11b, fcr=1, c_exp=8)
Creates a new ReedSolomon Encoder/Decoder object configured with the given n and k values. n is the length of a codeword, must be less than 256 k is the length of the message, must be less than n generator, prim and fcr parametrize the Galois Field that will be built c_exp is the Galois Field range (ie, 8 means GF(2^8) = GF(256)), which is both the limit for one symbol’s value, and the maximum length of a message+ecc.
The code will have error correcting power (ie, maximum number of repairable symbols) of 2*e+v <= (nk), where e is the number of errors and v the number of erasures.
The typical RSCoder is RSCoder(255, 223)
 RSCoder.encode(message, poly=False, k=None)
Encode a given string with reedsolomon encoding. Returns a byte string with the k message bytes and nk parity bytes at the end.
If a message is < k bytes long, it is assumed to be padded at the front with null bytes (ie, a shortened ReedSolomon code).
The sequence returned is always n bytes long.
If poly is not False, returns the encoded Polynomial object instead of the polynomial translated back to a string (useful for debugging)
You can change the length (number) of parity/ecc bytes at encoding by setting k to any value between [1, n1]. This allows to create only one RSCoder and then use it with a variable redundancy rate.
 RSCoder.encode_fast(message, poly=False, k=None)
Same as encode() but using faster algorithms and optimization tricks.
 RSCoder.decode(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):
Given a received string or byte array message_ecc (composed of a message string + ecc symbols at the end), attempts to decode it. If it’s a valid codeword, or if there are no more than 2*e+v <= (nk) erratas (called the Singleton bound), the message is returned.
You can provide the erasures positions as a list to erasures_pos. For example, if you have “hella warld” and you know that a is an erasure, you can provide the list erasures_pos=[4, 7]. You can correct twice as many erasures than errors, and if some provided erasures are wrong (they are correct symbols), then there’s no problem, they will be repaired just fine (but will count towards the Singleton bound). You can also specify that you are sure there are only erasures and no errors at all by setting only_erasures=True.
A message always has k bytes, if a message contained less it is left padded with null bytes (punctured RS code). When decoded, these leading null bytes are stripped, but that can cause problems if decoding binary data. When nostrip is True, messages returned are always k bytes long. This is useful to make sure no data is lost when decoding binary data.
Note that RS can correct errors both in the message and the ecc symbols.
 RSCoder.decode_fast(message_ecc, nostrip=False, k=None, erasures_pos=None, only_erasures=False):
Same as decode() but using faster algorithms and optimization tricks.
 RSCoder.check(message_ecc, k=None)
Verifies the codeword (message + ecc symbols at the end) is valid by testing that the code as a polynomial code divides g, or that the syndrome is all 0 coefficients. The result is not foolproof: if it’s False, you’re sure the message was corrupted (or that you used the wrong RS parameters), but if it’s True, it’s either that the message is correct, or that there are too many errors (ie, more than the Singleton bound) for RS to do anything about it. returns True/False
 RSCoder.check_fast(message_ecc, k=None)
Same as check() but using faster algorithms and optimization tricks.
 unireedsolomon.ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)
Compute the list of prime polynomials for the given generator and galois field characteristic exponent. You can then use this prime polynomial to specify the mandatory “prim” parameter, particularly if you are using a larger Galois Field (eg, 2^16).
Internal API
Besides the main RSCoder object, two other objects are used in this implementation: Polynomial and GF2int. Their use is not specifically tied to the coder or even to the ReedSolomon algorithm, they are just generic mathematical constructs respectively representing polynomials and Galois field’s number of base 2.
You do not need to know about the internal API to use the RS codec, this is just left as a documentation for the reader interested into dwelling inside the mathematical constructs.
 polynomial.Polynomial(coefficients=[], **sparse)
There are three ways to initialize a Polynomial object. 1) With a list, tuple, or other iterable, creates a polynomial using the items as coefficients in order of decreasing power
2) With keyword arguments such as for example x3=5, sets the coefficient of x^3 to be 5
3) With no arguments, creates an empty polynomial, equivalent to Polynomial([0])
>>> print Polynomial([5, 0, 0, 0, 0, 0]) 5x^5
>>> print Polynomial(x32=5, x64=8) 8x^64 + 5x^32
>>> print Polynomial(x5=5, x9=4, x0=2) 4x^9 + 5x^5 + 2
Polynomial objects export the following standard functions that perform the expected operations using polynomial arithmetic. Arithmetic of the coefficients is determined by the type passed in, so integers or GF2int objects could be used, the Polynomial class is agnostic to the type of the coefficients.
__add__ __divmod__ __eq__ __floordiv__ __hash__ __len__ __mod__ __mul__ __ne__ __neg__ __sub__ evaluate(x) degree() Returns the degree of the polynomial get_coefficient(degree) Returns the coefficient of the specified term
 ff.GF2int(value)
Instances of this object are elements of the field GF(2^p) and instances are integers in the range 0 to (2^p)1. By default, the field is GF(2^8) and instances are integers in the range 0 to 255 and is defined using the irreducable polynomial 0x11b or in binary form: x^8 + x^4 + x^3 + x + 1 and using 3 as the generator for the exponent table and log table.
You can however use other parameters for the Galois Field, using the init_lut() function.
 ff.find_prime_polynomials(generator=2, c_exp=8, fast_primes=False, single=False)
Find the list of prime polynomials to use to generate the lookup tables for your field.
 ff.init_lut(generator=3, prim=0x11b, c_exp=8)
Generate the lookup tables given the parameters. This effectively parametrize your Galois Field (ie, generator=2, prim=0x1002d, c_exp=16) will generate a GF(2^16) field.
The GF2int class inherits from int and supports all the usual integer operations. The following methods are overridden for arithmetic in the finite field GF(2^p)
__add__ __div__ __mul__ __neg__ __pow__ __radd__ __rdiv__ __rmul__ __rsub__ __sub__ inverse() Multiplicative inverse in GF(2^p)
Edge cases
Although sanity checks are implemented whenever possible and when they are not too much resource consuming, there are a few cases where messages will not be decoded correctly without raising an exception:
If an incorrect erasure location is provided, the decoding algorithm will just trust the provided locations and create a syndrome that will be wrong, resulting in an incorrect decoded message. In case reliability is critical, always use the check() method after decoding to check the decoding did not go wrong.
ReedSolomon algorithm is limited by the Singleton Bound, which limits not only its capacity to correct errors and erasures relatively to the number of error correction symbols, but also its ability to check if the message can be decoded or not. Indeed, if the number of errors and erasures are greater than the Singleton Bound, the decoder has no way to mathematically know for sure whether there is an error at all, it may very well be a valid message (although not the message you expect, but mathematically valid nevertheless). Hence, when the message is tampered beyond the Singleton Bound, the decoder may raise an exception, but it may also return a mathematically valid but still tampered message. Using the check() method cannot fix that either. To work around this issue, a solution is to use parity or hashing functions in parallel to the ReedSolomon codec: use the ReedSolomon codec to repair messages, use the parity or hashing function to check if there is any error. Due to how parity and hashing functions work, they are much less likely to produce a false negative than the ReedSolomon algorithm. This is a general rule: error correction codes are efficient at correcting messages but not at detecting errors, hashing and parity functions are the adequate tool for this purpose.
Example implementations
Image Encoder
imageencode.py is an example script that encodes codewords as rows in an image. It requires PIL to run.
Usage: python imageencode.py [d] <image file>
Without the d flag, imageencode.py will encode text from standard in and output it to the image file. With d, imageencode.py will read in the data from the image and output to standard out the decoded text.
An example is included: exampleimage.png. Try decoding it asis, then open it up in an image editor and paint some vertical stripes on it. As long as no more than 16 pixels per row are disturbed, the text will be decoded correctly. Then draw more stripes such that more than 16 pixels per row are disturbed and verify that the message is decoded improperly.
Notice how the parity data looks different–the last 32 pixels of each row are colored differently. That’s because this particular image contains encoded ASCII text, which generally only has bytes from a small range (the alphabet and printable punctuation). The parity data, however, is binary and contains bytes from the full range 0255. Also note that either the data area or the parity area (or both!) can be disturbed as long as no more than 16 bytes per row are disturbed.
Cython implementation
If either a C compiler or Cython is found, rs.py will automatically load the Cython implementations (the *.pyx files). These are provided as optimized versions of the purepython implementations, with equivalent functionalities. The goal was to get a speedup, which is the case, but using PyPy on the purepython implementation provides a significantly higher speedup than the Cython implementation. The Cython implementations are still provided for the interested reader, but the casual user is not advised to use them. If you want to encode and decode fast, use PyPy.
Projects using unireedsolomon
Several academic publications used unireedsolomon, see this list.
Recommended reading
“ReedSolomon codes for coders”, free practical beginner’s tutorial with Python code examples on WikiVersity. Partially written by one of the authors of the present software.
“Algebraic codes for data transmission”, Blahut, Richard E., 2003, Cambridge university press. Readable online on Google Books. This book was pivotal in helping to understand the intricacies of the universal BerlekampMassey algorithm (see figures 7.5 and 7.10).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unireedsolomon1.0.6cp310cp310win_amd64.whl
Algorithm  Hash digest  

SHA256  9a35fa2485da57f534233fa9487f420b9da53e96c2837a8d5c0ea75ea8346887 

MD5  5d2d67bc3f82071e103ecffd8cd3b004 

BLAKE2b256  40a2ada744caf68778592cdde25606b9eceb504f9d68146f98ca1ad2119b8445 