Pipeline to Aggregate Data for Optimised Cloud Capabilities

These details have not been verified by PyPI

Project description

PADOCC Package

Padocc (Pipeline to Aggregate Data for Optimal Cloud Capabilities) is a Data Aggregation pipeline for creating Kerchunk (or alternative) files to represent various datasets in different original formats. Currently the Pipeline supports writing JSON/Parquet Kerchunk files for input NetCDF/HDF files. Further developments will allow GeoTiff, GRIB and possibly MetOffice (.pp) files to be represented, as well as using the Pangeo Rechunker tool to create Zarr stores for Kerchunk-incompatible datasets.

Example Notebooks at this link

Documentation hosted at this link

Kerchunk Pipeline

Release 1.4.4

Release date: 22nd January 2026

See the release notes for details.

This package acknowledges contributions by Matt Brown as a pre-release tester.

Installation

To install this package, clone the repository using git clone, then follow the steps below to install the package with the necessary dependencies.

python -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install

Alternatively, install from PyPi with:

pip install padocc==1.4.4

Example Basic Usage.

Once installed, set a working directory environment variable. This location will be used to create all files within the PADOCC pipeline.

export WORKDIR=path/to/my/area

Note: You may also want to set the LOTUS_CFG environment variable, which must point to a lotus config file for use in parallel job deployment. See this link https://cedadev.github.io/padocc/detailed/parallel.html#lotus-2-configurations for more details.

Assemble the initialisation files.

You will need a text file containing all paths to the files you wish to aggregate per-dataset. (For files with a single variable in each, you will need a text file per variable.) Alternatively if all files can be described by a simple wildcard pattern i.e path/to/files/*.nc you may use this. These must go into a CSV file formatted as below for each row:

name_of_dataset,<path_to_text_file_OR_pattern>,,

Add a new row for each dataset/variable described by a set of input files.

Run the following commands in order (if you have >5 datasets in your CSV group you may want to look into parallelisation).

padocc init -G <group_name> -i <path_to_csv_file> -v
padocc scan -G <group_name> -v
padocc compute -G <group_name> -v
padocc validate -G <group_name> -v

If there are problems in the scan/compute phase please refer to the list of known errors here https://cedadev.github.io/padocc/detailed/features.html#custom-pipeline-errors. If the validate phase ends with Fatal errors you may need to recompute with alternative aggregators (V or K). Please try all the combinations to see if any aggregation works (--aggregator V or K in compute, with -n to increment version number).

Validations that result in Success or Warnings are OK and can proceed to completion. The report generated in validation is saved to the completion directory by default.

Note: Only do this once all groups are finished validation. Check this with padocc status -G <group_name>

padocc complete -G <group_name> --completion_dir path/to/outputs

If the data is NOT in the CEDA archive, you will need to add custom --sub and --replace to change the local filepaths of your input files to remote paths (wherever they are downloadable).

For all other queries please contact Daniel Westwood (daniel.westwood@stfc.ac.uk)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.4.6

Feb 23, 2026

1.4.5

Jan 29, 2026

1.4.4

Jan 22, 2026

1.4.3

Jan 19, 2026

1.4.2 yanked

Nov 18, 2025

Reason this release was yanked:

Dependency incompatibility

1.4.1

Nov 13, 2025

1.4.0

Oct 27, 2025

1.4.0a0 pre-release

Sep 8, 2025

1.3.5

Apr 17, 2025

1.3.4 yanked

Mar 10, 2025

Reason this release was yanked:

Critical Bugs

1.3.4a0 pre-release

Mar 10, 2025

1.3.3

Mar 7, 2025

1.3.2

Mar 3, 2025

1.3.1

Feb 13, 2025

1.3.0

Feb 5, 2025

1.3.0b0 pre-release

Jan 20, 2025

1.3.0a0 pre-release

Jan 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

padocc-1.4.6.tar.gz (9.9 MB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

padocc-1.4.6-py3-none-any.whl (10.0 MB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file padocc-1.4.6.tar.gz.

File metadata

Download URL: padocc-1.4.6.tar.gz
Upload date: Feb 23, 2026
Size: 9.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.9 Linux/5.14.0-611.27.1.el9_7.x86_64

File hashes

Hashes for padocc-1.4.6.tar.gz
Algorithm	Hash digest
SHA256	`5154f27a7c4b5f98b3b290db8ec35de10f1f7af0d1c0d2789e7758091c48a51e`
MD5	`25f813e0a8be5700fbdfcb944da3c6ea`
BLAKE2b-256	`eec873dc6476b40757050e8176c2c3a744a2bc55662888857d91525a8ada9170`

See more details on using hashes here.

File details

Details for the file padocc-1.4.6-py3-none-any.whl.

File metadata

Download URL: padocc-1.4.6-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 10.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.9 Linux/5.14.0-611.27.1.el9_7.x86_64

File hashes

Hashes for padocc-1.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d428772e5693abd95740dc07e6a4048dea2d438c9442c78c30747a48aaf1169`
MD5	`4e857a4f1a7ba765ecf778332b3738fe`
BLAKE2b-256	`5cc1d4d94c1fa870b121ade33beeacfbd778b14455ed7fa1c609acf832d82454`

See more details on using hashes here.

padocc 1.4.6

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

PADOCC Package

Release 1.4.4

Installation

Example Basic Usage.

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes