This library provides a simple Python interface for implementing erasure codes. To obtain the best possible performance, the underlying erasure code algorithms are written in C.
Project description
This is v1.0 of PyECLib. This library provides a simple Python interface for
implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x.
To obtain the best possible performance, the library utilizes liberasurecode,
which is a C based erasure code library. Please let us know if you have any
other issues building or installing (email: kmgreen2@gmail.com or
tusharsg@gmail.com).
This library makes use of Jesasure for Reed-Solomon as implemented by the
liberasurecode library and provides its' own flat XOR-based erasure code
encoder and decoder. Currently, it implements a specific class of HD
Combination Codes (see "Flat XOR-based erasure codes in storage systems:
Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These
codes are well-suited to archival use-cases, have a simple construction and
require a minimum number of participating disks during single-disk
reconstruction (think XOR-based LRC code).
Examples of using this library are provided in "tools" directory:
Command-line encoder::
tools/pyeclib_encode.py
Command-line decoder::
tools/pyeclib_decode.py
Utility to determine what is needed to reconstruct missing fragments::
tools/pyeclib_fragments_needed.py
PyEClib initialization::
ec_driver = ECDriver(k=<num_encoded_data_fragments>,
m=<num_encoded_parity_fragments>,
ec_type=<ec_scheme>))
Supported ``ec_type`` values:
* ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1]
* ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2]
* ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3]
* ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4]
* ``shss`` => NTT Lab Japan's Erasure Coding Library
A configuration utility is provided to help compare available EC schemes in
terms of performance and redundancy:: tools/pyeclib_conf_tool.py
The Python API supports the following functions:
- EC Encode
Encode N bytes of a data object into k (data) + m (parity) fragments::
def encode(self, data_bytes)
input: data_bytes - input data object (bytes)
returns: list of fragments (bytes)
- EC Decode
Decode between k and k+m fragments into original object::
def decode(self, fragment_payloads)
input: list of fragment_payloads (bytes)
returns: decoded object (bytes)
*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7.
In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care
needs to be taken when handling input to and output from the ``encode()`` and
``decode()`` routines.
- EC Reconstruct
Reconstruct "missing_fragment_indexes" using "available_fragment_payloads"::
def reconstruct(self, available_fragment_payloads, missing_fragment_indexes)
- Minimum parity fragments needed for durability gurantees::
def min_parity_fragments_needed(self)
- Fragments needed for EC Reconstruct
Return the indexes of fragments needed to reconstruct "missing_fragment_indexes"::
def fragments_needed(self, missing_fragment_indexes)
- Get EC Metadata
Return an opaque buffer known by the underlying library::
def get_metadata(self, fragment)
- Verify EC Stripe Consistency
Use opaque buffers from get_metadata() to verify a the consistency of a stripe::
def verify_stripe_metadata(self, fragment_metadata_list)
- Get EC Segment Info
Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments::
def get_segment_info(self, data_len, segment_size)
- Get EC Segment Info given a data length and segment size::
def get_segment_info_byterange(self, ranges, data_len, segment_size)
Assume a range request is given for an object with segment size 3K and
a 1 MB file:
Ranges = (0, 1), (1, 12), (10, 1000), (0, segment_size-1),
(1, segment_size+1), (segment_size-1, 2*segment_size)
This will return a map keyed on the ranges, where there is a recipe
given for each range:
{
(0, 1): {0: (0, 1)},
(10, 1000): {0: (10, 1000)},
(1, 12): {0: (1, 12)},
(0, 3071): {0: (0, 3071)},
(3071, 6144): {0: (3071, 3071), 1: (0, 3071), 2: (0, 0)},
(1, 3073): {0: (1, 3071), 1: (0,0)}
}
Quick Start:
Standard stuff to install::
Python 2.6, 2.7 or 3.x (including development packages), argparse, liberasurecode, gf-complete and Jerasure.
As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode
can be found at https://bitbucket.org/tsg-/liberasurecode)
Install PyECLib::
$ sudo python setup.py install
Run test suite included::
$ ./.unittests
If all of this works, then you should be good to go. If not, send us an email!
If the test suite fails because it cannot find any of the shared libraries,
then you probably need to add /usr/local/lib to the path searched when loading
libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to::
/etc/ld.so.conf
and then run::
$ ldconfig
References
[1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org
[2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf
[3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode
[4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
[5] Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>, Ryuta Kon <kon.ryuta@po.ntts.co.jp>, "NTT SHSS Erasure Coding backend"
--
1.0
implementing erasure codes and is known to work with Python v2.6, 2.7 and 3.x.
To obtain the best possible performance, the library utilizes liberasurecode,
which is a C based erasure code library. Please let us know if you have any
other issues building or installing (email: kmgreen2@gmail.com or
tusharsg@gmail.com).
This library makes use of Jesasure for Reed-Solomon as implemented by the
liberasurecode library and provides its' own flat XOR-based erasure code
encoder and decoder. Currently, it implements a specific class of HD
Combination Codes (see "Flat XOR-based erasure codes in storage systems:
Constructions, efficient recovery, and tradeoffs" in IEEE MSST 2010). These
codes are well-suited to archival use-cases, have a simple construction and
require a minimum number of participating disks during single-disk
reconstruction (think XOR-based LRC code).
Examples of using this library are provided in "tools" directory:
Command-line encoder::
tools/pyeclib_encode.py
Command-line decoder::
tools/pyeclib_decode.py
Utility to determine what is needed to reconstruct missing fragments::
tools/pyeclib_fragments_needed.py
PyEClib initialization::
ec_driver = ECDriver(k=<num_encoded_data_fragments>,
m=<num_encoded_parity_fragments>,
ec_type=<ec_scheme>))
Supported ``ec_type`` values:
* ``jerasure_rs_vand`` => Vandermonde Reed-Solomon encoding, based on Jerasure [1]
* ``jerasure_rs_cauchy`` => Cauchy Reed-Solomon encoding (Jerasure variant), based on Jerasure [2]
* ``flat_xor_hd_3``, ``flat_xor_hd_4`` => Flat-XOR based HD combination codes, liberasurecode [3]
* ``isa_l_rs_vand`` => Intel Storage Acceleration Library (ISA-L) - SIMD accelerated Erasure Coding backends [4]
* ``shss`` => NTT Lab Japan's Erasure Coding Library
A configuration utility is provided to help compare available EC schemes in
terms of performance and redundancy:: tools/pyeclib_conf_tool.py
The Python API supports the following functions:
- EC Encode
Encode N bytes of a data object into k (data) + m (parity) fragments::
def encode(self, data_bytes)
input: data_bytes - input data object (bytes)
returns: list of fragments (bytes)
- EC Decode
Decode between k and k+m fragments into original object::
def decode(self, fragment_payloads)
input: list of fragment_payloads (bytes)
returns: decoded object (bytes)
*Note*: ``bytes`` is a synonym to ``str`` in Python 2.6, 2.7.
In Python 3.x, ``bytes`` and ``str`` types are non-interchangeable and care
needs to be taken when handling input to and output from the ``encode()`` and
``decode()`` routines.
- EC Reconstruct
Reconstruct "missing_fragment_indexes" using "available_fragment_payloads"::
def reconstruct(self, available_fragment_payloads, missing_fragment_indexes)
- Minimum parity fragments needed for durability gurantees::
def min_parity_fragments_needed(self)
- Fragments needed for EC Reconstruct
Return the indexes of fragments needed to reconstruct "missing_fragment_indexes"::
def fragments_needed(self, missing_fragment_indexes)
- Get EC Metadata
Return an opaque buffer known by the underlying library::
def get_metadata(self, fragment)
- Verify EC Stripe Consistency
Use opaque buffers from get_metadata() to verify a the consistency of a stripe::
def verify_stripe_metadata(self, fragment_metadata_list)
- Get EC Segment Info
Return a dict with the keys - segment_size, last_segment_size, fragment_size, last_fragment_size and num_segments::
def get_segment_info(self, data_len, segment_size)
- Get EC Segment Info given a data length and segment size::
def get_segment_info_byterange(self, ranges, data_len, segment_size)
Assume a range request is given for an object with segment size 3K and
a 1 MB file:
Ranges = (0, 1), (1, 12), (10, 1000), (0, segment_size-1),
(1, segment_size+1), (segment_size-1, 2*segment_size)
This will return a map keyed on the ranges, where there is a recipe
given for each range:
{
(0, 1): {0: (0, 1)},
(10, 1000): {0: (10, 1000)},
(1, 12): {0: (1, 12)},
(0, 3071): {0: (0, 3071)},
(3071, 6144): {0: (3071, 3071), 1: (0, 3071), 2: (0, 0)},
(1, 3073): {0: (1, 3071), 1: (0,0)}
}
Quick Start:
Standard stuff to install::
Python 2.6, 2.7 or 3.x (including development packages), argparse, liberasurecode, gf-complete and Jerasure.
As mentioned above, PyECLib depends on the installation of the liberasurecde library (liberasurecode
can be found at https://bitbucket.org/tsg-/liberasurecode)
Install PyECLib::
$ sudo python setup.py install
Run test suite included::
$ ./.unittests
If all of this works, then you should be good to go. If not, send us an email!
If the test suite fails because it cannot find any of the shared libraries,
then you probably need to add /usr/local/lib to the path searched when loading
libraries. The best way to do this (on Linux) is to add '/usr/local/lib' to::
/etc/ld.so.conf
and then run::
$ ldconfig
References
[1] Jerasure, C library that supports erasure coding in storage applications, http://jerasure.org
[2] Greenan, Kevin M et al, "Flat XOR-based erasure codes in storage systems", http://www.kaymgee.com/Kevin_Greenan/Publications_files/greenan-msst10.pdf
[3] liberasurecode, C API abstraction layer for erasure coding backends, https://bitbucket.org/tsg-/liberasurecode
[4] Intel(R) Storage Acceleration Library (Open Source Version), https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
[5] Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>, Ryuta Kon <kon.ryuta@po.ntts.co.jp>, "NTT SHSS Erasure Coding backend"
--
1.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PyECLib-1.0.1.tar.gz
(8.4 MB
view details)
File details
Details for the file PyECLib-1.0.1.tar.gz
.
File metadata
- Download URL: PyECLib-1.0.1.tar.gz
- Upload date:
- Size: 8.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb1761005f9590018b5be1a0ca36e00cc9b3a7f7fba0f345d4ae428ec1ea4351 |
|
MD5 | 81fc1ae82c211522008373d9ad40b367 |
|
BLAKE2b-256 | f82885623770b756fd29d6ca8d7ef56be678b86d52576da236909fc5e5739443 |