Skip to main content

A Python interface to odc for encoding/decoding ODB-2 files.

Project description

pyodc

PyPI Build Status Documentation Status Code style: black Licence

A Python interface to odc for encoding/decoding ODB-2 files.

The package contains two different implementations of the same library:

  • pyodc is a pure-python encoder and decoder for ODB-2 data, which encodes data from, and decodes it into pandas data frames
  • codc is an implementation of the same API as pyodc that depends on the ECMWF odc library, and comes with much better performance.

Both libraries are be installed by running pip install pyodc, and since version 1.6.0, a pre-built wheel version of odc will be automatically installed so that codc can be used without any additional steps.

Documentation Changelog

Dependencies

Required

  • Python 3.x

Optional

For codc to work, the odc library must be compiled and installed on the system and made available to Python. Typically this happens automatically as described above through the dependency on odclib which bundles a precompiled version of odc as a wheel. If some some reason this doesn't work, there are multiple other ways to make the library visible to pyodc:

  • It can be installed as a system library.
  • The installation prefix can be passed in the odc_DIR or ODC_DIR environment variables.
  • The library directory can be included in `LD_LIBRARY_PATH.

Installation

pip install pyodc

Check if the module was installed correctly:

python
>>> import pyodc as odc # pure python
>>> import codc as odc # faster

Usage

An introductory Jupyter Notebook with helpful usage examples is provided in the root of this repository:

git clone git@github.com:ecmwf/pyodc.git
cd pyodc
jupyter notebook Introduction.ipynb

Note that codc is not thread safe so care should be taken when using it with dask. You can set dask to use processses rather than threads by doing:

with dask.config.set(scheduler='processes'):
    dask.compute(...)

Development

Run Unit Tests

To run the unit tests, make sure that the pytest module is installed first:

python -m pytest

Run Unit Tests across multiple python versions with Tox

Tox is a useful tool to quickly run pytest across multiple python versions by managing a set of python environments for you. A tox.ini file is provided that targets python3.8 - 3.12. Note that this will also install older versions of libraries like numpy which helps to catch incompatibilities with older versions of those libraries too.

To run tox, install it, modify the ODC_HOME = ../build line in tox.ini to point to a build of odc, this will be reused for all the tests. Then run

tox

The first run will take a while for it to install all the environments but after that it's very fast.

Build Documentation

To build the documentation locally, please install the Python dependencies first:

cd docs
pip install -r requirements.txt
make html

The built HTML documentation will be available under the docs/_build/html/index.html path.

License

This software is licensed under the terms of the Apache Licence Version 2.0 which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.

In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.

Changelog for pyodc

1.6.0

  • pip install pyodc will now install the C++ backend so codc will work immediately.
    • The C++ backed is now installable with pip from odclib.
    • Added findlibs and odclib as dependencies.
    • To force the use of a different odc shared library, set the environment variable ODC_DIR to the directory containing the shared library. See the findlibs documentation for more information.

1.5.0

  • Add a new LongConstantString codec which permits encoding constant columns where the constant is a string > 8 characters in length.

    • This saves 1 byte per row compared the previous way these columns were encoded.
    • A C++ implementation has been added to ODC at the same time, version 1.6.0
    • Bumped required ODC version to 1.6.0 for feature parity.
    • Decoding data using this codec will work straight away.
    • Encoding data with the new codec is disabled by default and can be enabled with the environment variable "ODC_ENABLE_WRITING_LONG_STRING_CODEC=1".
    • At some point in a future release, encoding will be enabled by default.
  • Accept various new datatypes and tighten datatype selection logic (fixes ODB-559):

    • Unsigned Integers: uint8 - uint32 (note uint64 is not supported).
    • Signed Integers: int8 - int64.
    • Float32 in addition to float64.
    • Fixed the selection logic for ShortReal2 and ShortReal codecs so the smallest positive normal float32 number struct.unpack("<f", b"\x00\x00\x80\x00")[0] can now be used in data.
  • Converted to a pyproject.toml based package.

  • Fix various warnings:

    • Pandas Deprecation warning about df.dtypes[0] needing to become df.dtypes.iloc[0].
    • Pandas Deprecation warning about converting implicitly converting dataframe column dtype.
    • Pandas Future Warning about concatenation with empty or all-NA dataframes.
    • "pkg_resources is deprecated as an API."

1.4.1

  • Use findlibs instead of custom finder for odc
  • Support constant bitfields
  • Correct encoding with constant strings > 8 characters in length
  • Support pandas native string type
  • Fix access to exploded bitfield columns

1.1.3

  • Improved github/ci integration

1.1.2

  • Fixed #6: pip install breaks codc

1.1.1

  • Fixed ODB-534: PyPI package is missing CHANGELOG

1.1.0

  • Fixed ODB-533: Decode data starting with missing values correctly
  • Fixed ODB-530: Bitfield column inspection returns incomplete data in pure-Python implementation
  • Bumped up required odc version number to 1.4.0
  • Added missing frame properties accessor to codc interface
  • Fixed ODB-525: Setting odc prefix variable (odc_DIR) does not work as expected on macOS
  • Fixed ODB-524: Keys and values in decoded frame properties are switched on older Python version
  • Added test flag to skip codc tests on demand (PYODC_SKIP_CODC)
  • Fixed ODB-523: Additional properties parameter is omitted in encode_odb() when string is passed as file
  • Fixed package setup metadata
  • Added documentation

1.0.4

  • Correct support for constant codecs
  • Decoding by column short name

1.0.3

  • Specify odc library location with odc/ODC_DIR
  • Correct setup.py dependencies to include pandas
  • Support missing ConstantString values encoded from ODB1 using the odb_migrator

1.0.2

  • String missing values should be None not NaN
  • Refactor oneshot behaviour (read_odb_oneshot --> read_odb(..., single=True))
  • Raise correct error on odc not found
  • Split codb.py into a full codc module
  • Fix miscellaneous bugs

1.0.1

  • Fixed automatic selection of integral codecs

1.0.0

  • Initial version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyodc-1.6.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

pyodc-1.6.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file pyodc-1.6.0.tar.gz.

File metadata

  • Download URL: pyodc-1.6.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyodc-1.6.0.tar.gz
Algorithm Hash digest
SHA256 702ce63f91e5a0ac75812c1b7a298c7bc95e7287531155f8a6e10ca31cd345aa
MD5 f279f64629139062e070c2e61d063f0e
BLAKE2b-256 078ba2f8ab8a1c6bca623e5781d8bfaf06a8ecbbe988e2db4adf6025268104f1

See more details on using hashes here.

File details

Details for the file pyodc-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: pyodc-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pyodc-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af3e083273285ff9e29020ae18e21523132d438173526e18078da27ed1810420
MD5 fdc43aca05c1ec81ae669117ad2b1756
BLAKE2b-256 a1477c905f0a7bee5694d71de3dc46904aa4c808e108a463fc49f855b690ca1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page