A Python interface to odc for encoding/decoding ODB-2 files.
Project description
pyodc
A Python interface to odc for encoding/decoding ODB-2 files.
The package contains two different implementations of the same library:
pyodcis a pure-python encoder and decoder for ODB-2 data, which encodes data from, and decodes it into pandas data framescodcis an implementation of the same API aspyodcthat depends on the ECMWFodclibrary, and comes with much better performance.
Both libraries are be installed by running pip install pyodc, and since version 1.6.0, a pre-built wheel version of odc will be automatically installed so that codc can be used without any additional steps.
Dependencies
Required
- Python 3.x
Optional
For codc to work, the odc library must be compiled and installed on the system and made available to Python. Typically this happens automatically as described above through the dependency on odclib which bundles a precompiled version of odc as a wheel. If some some reason this doesn't work, there are multiple other ways to make the library visible to pyodc:
- It can be installed as a system library.
- The installation prefix can be passed in the
odc_DIRorODC_DIRenvironment variables. - The library directory can be included in `LD_LIBRARY_PATH.
Installation
pip install pyodc
Check if the module was installed correctly:
python
>>> import pyodc as odc # pure python
>>> import codc as odc # faster
Usage
An introductory Jupyter Notebook with helpful usage examples is provided in the root of this repository:
git clone git@github.com:ecmwf/pyodc.git
cd pyodc
jupyter notebook Introduction.ipynb
Note that codc is not thread safe so care should be taken when using it with dask. You can set dask to use processses rather than threads by doing:
with dask.config.set(scheduler='processes'):
dask.compute(...)
Development
Run Unit Tests
To run the unit tests, make sure that the pytest module is installed first:
python -m pytest
Run Unit Tests across multiple python versions with Tox
Tox is a useful tool to quickly run pytest across multiple python versions by managing a set of python environments for you. A tox.ini file is provided that targets python3.8 - 3.12. Note that this will also install older versions of libraries like numpy which helps to catch incompatibilities with older versions of those libraries too.
To run tox, install it, modify the ODC_HOME = ../build line in tox.ini to point to a build of odc, this will be reused for all the tests. Then run
tox
The first run will take a while for it to install all the environments but after that it's very fast.
Build Documentation
To build the documentation locally, please install the Python dependencies first:
cd docs
pip install -r requirements.txt
make html
The built HTML documentation will be available under the docs/_build/html/index.html path.
License
This software is licensed under the terms of the Apache Licence Version 2.0 which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.
Changelog for pyodc
1.6.0
pip install pyodcwill now install the C++ backend socodcwill work immediately.- The C++ backed is now installable with pip from
odclib. - Added
findlibsandodclibas dependencies. - To force the use of a different
odcshared library, set the environment variableODC_DIRto the directory containing the shared library. See the findlibs documentation for more information.
- The C++ backed is now installable with pip from
1.5.0
-
Add a new LongConstantString codec which permits encoding constant columns where the constant is a string > 8 characters in length.
- This saves 1 byte per row compared the previous way these columns were encoded.
- A C++ implementation has been added to ODC at the same time, version 1.6.0
- Bumped required ODC version to 1.6.0 for feature parity.
- Decoding data using this codec will work straight away.
- Encoding data with the new codec is disabled by default and can be enabled with the environment variable "ODC_ENABLE_WRITING_LONG_STRING_CODEC=1".
- At some point in a future release, encoding will be enabled by default.
-
Accept various new datatypes and tighten datatype selection logic (fixes ODB-559):
- Unsigned Integers: uint8 - uint32 (note uint64 is not supported).
- Signed Integers: int8 - int64.
- Float32 in addition to float64.
- Fixed the selection logic for ShortReal2 and ShortReal codecs so the smallest positive normal float32 number
struct.unpack("<f", b"\x00\x00\x80\x00")[0]can now be used in data.
-
Converted to a pyproject.toml based package.
-
Fix various warnings:
- Pandas Deprecation warning about
df.dtypes[0]needing to becomedf.dtypes.iloc[0]. - Pandas Deprecation warning about converting implicitly converting dataframe column dtype.
- Pandas Future Warning about concatenation with empty or all-NA dataframes.
- "pkg_resources is deprecated as an API."
- Pandas Deprecation warning about
1.4.1
- Use findlibs instead of custom finder for odc
- Support constant bitfields
- Correct encoding with constant strings > 8 characters in length
- Support pandas native string type
- Fix access to exploded bitfield columns
1.1.3
- Improved github/ci integration
1.1.2
- Fixed #6: pip install breaks codc
1.1.1
- Fixed ODB-534: PyPI package is missing CHANGELOG
1.1.0
- Fixed ODB-533: Decode data starting with missing values correctly
- Fixed ODB-530: Bitfield column inspection returns incomplete data in pure-Python implementation
- Bumped up required
odcversion number to 1.4.0 - Added missing frame properties accessor to
codcinterface - Fixed ODB-525: Setting odc prefix variable (
odc_DIR) does not work as expected on macOS - Fixed ODB-524: Keys and values in decoded frame properties are switched on older Python version
- Added test flag to skip
codctests on demand (PYODC_SKIP_CODC) - Fixed ODB-523: Additional properties parameter is omitted in encode_odb() when string is passed as file
- Fixed package setup metadata
- Added documentation
1.0.4
- Correct support for constant codecs
- Decoding by column short name
1.0.3
- Specify
odclibrary location withodc/ODC_DIR - Correct
setup.pydependencies to include pandas - Support missing ConstantString values encoded from ODB1 using the
odb_migrator
1.0.2
- String missing values should be
NonenotNaN - Refactor oneshot behaviour (
read_odb_oneshot-->read_odb(..., single=True)) - Raise correct error on
odcnot found - Split
codb.pyinto a fullcodcmodule - Fix miscellaneous bugs
1.0.1
- Fixed automatic selection of integral codecs
1.0.0
- Initial version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyodc-1.6.0.tar.gz.
File metadata
- Download URL: pyodc-1.6.0.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
702ce63f91e5a0ac75812c1b7a298c7bc95e7287531155f8a6e10ca31cd345aa
|
|
| MD5 |
f279f64629139062e070c2e61d063f0e
|
|
| BLAKE2b-256 |
078ba2f8ab8a1c6bca623e5781d8bfaf06a8ecbbe988e2db4adf6025268104f1
|
File details
Details for the file pyodc-1.6.0-py3-none-any.whl.
File metadata
- Download URL: pyodc-1.6.0-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af3e083273285ff9e29020ae18e21523132d438173526e18078da27ed1810420
|
|
| MD5 |
fdc43aca05c1ec81ae669117ad2b1756
|
|
| BLAKE2b-256 |
a1477c905f0a7bee5694d71de3dc46904aa4c808e108a463fc49f855b690ca1c
|