A Python interface to odc for encoding/decoding ODB-2 files.
Project description
pyodc
A Python interface to odc
for encoding/decoding ODB-2 files.
The package contains two different implementations of the same library:
pyodc
is a pure-python encoder and decoder for ODB-2 data, which encodes data from, and decodes it into pandas data framescodc
is an implementation of the same API aspyodc
that depends on the ECMWFodc
library, and comes with much better performance.
Both libraries are be installed by running pip install pyodc
, and since version 1.6.0, a pre-built wheel version of odc
will be automatically installed so that codc
can be used without any additional steps.
Dependencies
Required
- Python 3.x
Optional
For codc
to work, the odc
library must be compiled and installed on the system and made available to Python. Typically this happens automatically as described above through the dependency on odclib
which bundles a precompiled version of odc
as a wheel. If some some reason this doesn't work, there are multiple other ways to make the library visible to pyodc:
- It can be installed as a system library.
- The installation prefix can be passed in the
odc_DIR
orODC_DIR
environment variables. - The library directory can be included in `LD_LIBRARY_PATH.
Installation
pip install pyodc
Check if the module was installed correctly:
python
>>> import pyodc as odc # pure python
>>> import codc as odc # faster
Usage
An introductory Jupyter Notebook with helpful usage examples is provided in the root of this repository:
git clone git@github.com:ecmwf/pyodc.git
cd pyodc
jupyter notebook Introduction.ipynb
Note that codc is not thread safe so care should be taken when using it with dask. You can set dask to use processses rather than threads by doing:
with dask.config.set(scheduler='processes'):
dask.compute(...)
Development
Run Unit Tests
To run the unit tests, make sure that the pytest
module is installed first:
python -m pytest
Run Unit Tests across multiple python versions with Tox
Tox is a useful tool to quickly run pytest across multiple python versions by managing a set of python environments for you. A tox.ini file is provided that targets python3.8 - 3.12. Note that this will also install older versions of libraries like numpy which helps to catch incompatibilities with older versions of those libraries too.
To run tox, install it, modify the ODC_HOME = ../build
line in tox.ini to point to a build of odc, this will be reused for all the tests. Then run
tox
The first run will take a while for it to install all the environments but after that it's very fast.
Build Documentation
To build the documentation locally, please install the Python dependencies first:
cd docs
pip install -r requirements.txt
make html
The built HTML documentation will be available under the docs/_build/html/index.html
path.
License
This software is licensed under the terms of the Apache Licence Version 2.0 which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.
In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.
Changelog for pyodc
1.6.0
pip install pyodc
will now install the C++ backend socodc
will work immediately.- The C++ backed is now installable with pip from
odclib
. - Added
findlibs
andodclib
as dependencies. - To force the use of a different
odc
shared library, set the environment variableODC_DIR
to the directory containing the shared library. See the findlibs documentation for more information.
- The C++ backed is now installable with pip from
1.5.0
-
Add a new LongConstantString codec which permits encoding constant columns where the constant is a string > 8 characters in length.
- This saves 1 byte per row compared the previous way these columns were encoded.
- A C++ implementation has been added to ODC at the same time, version 1.6.0
- Bumped required ODC version to 1.6.0 for feature parity.
- Decoding data using this codec will work straight away.
- Encoding data with the new codec is disabled by default and can be enabled with the environment variable "ODC_ENABLE_WRITING_LONG_STRING_CODEC=1".
- At some point in a future release, encoding will be enabled by default.
-
Accept various new datatypes and tighten datatype selection logic (fixes ODB-559):
- Unsigned Integers: uint8 - uint32 (note uint64 is not supported).
- Signed Integers: int8 - int64.
- Float32 in addition to float64.
- Fixed the selection logic for ShortReal2 and ShortReal codecs so the smallest positive normal float32 number
struct.unpack("<f", b"\x00\x00\x80\x00")[0]
can now be used in data.
-
Converted to a pyproject.toml based package.
-
Fix various warnings:
- Pandas Deprecation warning about
df.dtypes[0]
needing to becomedf.dtypes.iloc[0]
. - Pandas Deprecation warning about converting implicitly converting dataframe column dtype.
- Pandas Future Warning about concatenation with empty or all-NA dataframes.
- "pkg_resources is deprecated as an API."
- Pandas Deprecation warning about
1.4.1
- Use findlibs instead of custom finder for odc
- Support constant bitfields
- Correct encoding with constant strings > 8 characters in length
- Support pandas native string type
- Fix access to exploded bitfield columns
1.1.3
- Improved github/ci integration
1.1.2
- Fixed #6: pip install breaks codc
1.1.1
- Fixed ODB-534: PyPI package is missing CHANGELOG
1.1.0
- Fixed ODB-533: Decode data starting with missing values correctly
- Fixed ODB-530: Bitfield column inspection returns incomplete data in pure-Python implementation
- Bumped up required
odc
version number to 1.4.0 - Added missing frame properties accessor to
codc
interface - Fixed ODB-525: Setting odc prefix variable (
odc_DIR
) does not work as expected on macOS - Fixed ODB-524: Keys and values in decoded frame properties are switched on older Python version
- Added test flag to skip
codc
tests on demand (PYODC_SKIP_CODC
) - Fixed ODB-523: Additional properties parameter is omitted in encode_odb() when string is passed as file
- Fixed package setup metadata
- Added documentation
1.0.4
- Correct support for constant codecs
- Decoding by column short name
1.0.3
- Specify
odc
library location withodc/ODC_DIR
- Correct
setup.py
dependencies to include pandas - Support missing ConstantString values encoded from ODB1 using the
odb_migrator
1.0.2
- String missing values should be
None
notNaN
- Refactor oneshot behaviour (
read_odb_oneshot
-->read_odb(..., single=True)
) - Raise correct error on
odc
not found - Split
codb.py
into a fullcodc
module - Fix miscellaneous bugs
1.0.1
- Fixed automatic selection of integral codecs
1.0.0
- Initial version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyodc-1.6.0.tar.gz
.
File metadata
- Download URL: pyodc-1.6.0.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
702ce63f91e5a0ac75812c1b7a298c7bc95e7287531155f8a6e10ca31cd345aa
|
|
MD5 |
f279f64629139062e070c2e61d063f0e
|
|
BLAKE2b-256 |
078ba2f8ab8a1c6bca623e5781d8bfaf06a8ecbbe988e2db4adf6025268104f1
|
File details
Details for the file pyodc-1.6.0-py3-none-any.whl
.
File metadata
- Download URL: pyodc-1.6.0-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
af3e083273285ff9e29020ae18e21523132d438173526e18078da27ed1810420
|
|
MD5 |
fdc43aca05c1ec81ae669117ad2b1756
|
|
BLAKE2b-256 |
a1477c905f0a7bee5694d71de3dc46904aa4c808e108a463fc49f855b690ca1c
|