A modern library for converting scientific data to Zarr format

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tomdurrant

These details have not been verified by PyPI

Project description

zarrio

A modern, clean library for converting scientific data formats to Zarr format.

Overview

zarrio is a rewrite of the original onzarr library with a focus on simplicity, performance, and maintainability. It leverages modern xarray and zarr capabilities to provide efficient conversion of NetCDF and other scientific data formats to Zarr format.

Features

Simple API: Clean, intuitive interfaces for common operations
Efficient Conversion: Fast conversion of NetCDF to Zarr format
Data Packing: Compress data using fixed-scale offset encoding
Intelligent Chunking: Automatic chunking recommendations based on access patterns (temporal, spatial, balanced) with intelligent chunking for parallel archives
Compression: Support for various compression algorithms
Time Series Handling: Efficient handling of time-series data
Data Appending: Append new data to existing Zarr archives
Parallel Writing: Create template archives and write regions in parallel with intelligent chunking
Metadata Preservation: Maintain dataset metadata during conversion

Installation

pip install zarrio

Usage

Command Line Interface

# Convert NetCDF to Zarr
zarrio convert input.nc output.zarr

# Convert with chunking
zarrio convert input.nc output.zarr --chunking "time:100,lat:50,lon:100"

# Convert with compression
zarrio convert input.nc output.zarr --compression "blosc:zstd:3"

# Convert with data packing
zarrio convert input.nc output.zarr --packing --packing-bits 16

# Convert with manual packing ranges
zarrio convert input.nc output.zarr --packing \
    --packing-manual-ranges '{"temperature": {"min": -50, "max": 50}}'

# Analyze NetCDF file for optimization recommendations
zarrio analyze input.nc

# Analyze with theoretical performance testing
zarrio analyze input.nc --test-performance

# Analyze with actual performance testing
zarrio analyze input.nc --run-tests

# Analyze with interactive configuration setup
zarrio analyze input.nc --interactive

# Create template for parallel writing
zarrio create-template template.nc archive.zarr --global-start 2023-01-01 --global-end 2023-12-31

# Create template with intelligent chunking
zarrio create-template template.nc archive.zarr --global-start 2023-01-01 --global-end 2023-12-31 --intelligent-chunking --access-pattern temporal

# Write region to existing archive
zarrio write-region data.nc archive.zarr

# Append to existing Zarr store
zarrio append new_data.nc existing.zarr

Python API

from zarrio import convert_to_zarr, append_to_zarr, ZarrConverter

# Simple conversion
convert_to_zarr("input.nc", "output.zarr")

# Conversion with options
convert_to_zarr(
    "input.nc",
    "output.zarr",
    chunking={"time": 100, "lat": 50, "lon": 100},
    compression="blosc:zstd:3",
    packing=True,
    packing_bits=16,
    packing_manual_ranges={
        "temperature": {"min": -50, "max": 50}
    },
    packing_auto_buffer_factor=0.05
)

# Using the class-based interface
converter = ZarrConverter(
    chunking={"time": 100, "lat": 50, "lon": 100},
    compression="blosc:zstd:3",
    packing=True,
    packing_manual_ranges={
        "temperature": {"min": -50, "max": 50}
    }
)
converter.convert("input.nc", "output.zarr")

# Parallel writing workflow
# 1. Create template archive
converter.create_template(
    template_dataset=template_ds,
    output_path="archive.zarr",
    global_start="2023-01-01",
    global_end="2023-12-31",
    compute=False  # Metadata only
)

# 2. Write regions in parallel (in separate processes)
converter.write_region("data1.nc", "archive.zarr")
converter.write_region("data2.nc", "archive.zarr")
converter.write_region("data3.nc", "archive.zarr")

# Append to existing Zarr store
append_to_zarr("new_data.nc", "existing.zarr")

Parallel Writing

One of the key features of zarrio is support for parallel writing of large datasets:

# Step 1: Create template archive with intelligent chunking
converter = ZarrConverter(
    chunking={"time": 100, "lat": 50, "lon": 100},
    access_pattern="temporal"  # Optimize for time series analysis
)
converter.create_template(
    template_dataset=template_dataset,
    output_path="large_archive.zarr",
    global_start="2020-01-01",
    global_end="2023-12-31",
    compute=False,  # Metadata only, no data computation
    intelligent_chunking=True,  # Enable intelligent chunking based on full archive dimensions
    access_pattern="temporal"   # Optimize for time series analysis
)

# Step 2: Write regions in parallel processes
# Process 1: converter.write_region("file1.nc", "large_archive.zarr")
# Process 2: converter.write_region("file2.nc", "large_archive.zarr")
# Process 3: converter.write_region("file3.nc", "large_archive.zarr")

This approach is ideal for converting large numbers of NetCDF files to a single Zarr archive in parallel. The intelligent chunking feature ensures optimal chunking based on the full archive dimensions rather than just the template dataset.

Configuration

You can also use configuration files (YAML or JSON):

# config.yaml
chunking:
  time: 100
  lat: 50
  lon: 100
compression: "blosc:zstd:3"
packing:
  enabled: true
  bits: 16
  manual_ranges:
    temperature:
      min: -50
      max: 50
  auto_buffer_factor: 0.05
variables:
  - temperature
  - pressure
drop_variables:
  - unused_var

Then use it with the CLI:

zarrio convert input.nc output.zarr --config config.yaml

Development

Installation

git clone https://github.com/oceanum/zarrio.git
cd zarrio
pip install -e .

Running Tests

pip install -e ".[dev]"
pytest

Code Quality

# Format code
black .

# Check code style
flake8

# Type checking
mypy zarrio

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tomdurrant

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

May 6, 2026

This version

0.1.1

Nov 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zarrio-0.1.1.tar.gz (46.2 kB view details)

Uploaded Nov 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zarrio-0.1.1-py3-none-any.whl (36.6 kB view details)

Uploaded Nov 6, 2025 Python 3

File details

Details for the file zarrio-0.1.1.tar.gz.

File metadata

Download URL: zarrio-0.1.1.tar.gz
Upload date: Nov 6, 2025
Size: 46.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrio-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c1ec14705d81e9259a3b7cb4d3f1e6032b9606011f91577b7c3c1c8f420d32f5`
MD5	`276cd28cfceea103d404bdd4ae92bb04`
BLAKE2b-256	`3453449d294fa1c19afb2eead7473c7eda3b12f5cfe1e1d9a26344b4de37f11b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrio-0.1.1.tar.gz:

Publisher: python-publish.yml on oceanum/zarrio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zarrio-0.1.1.tar.gz
- Subject digest: c1ec14705d81e9259a3b7cb4d3f1e6032b9606011f91577b7c3c1c8f420d32f5
- Sigstore transparency entry: 674942815
- Sigstore integration time: Nov 6, 2025
Source repository:
- Permalink: oceanum/zarrio@d7c7fe2c98a83af183796ef3be19dfd6b8945512
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/oceanum
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d7c7fe2c98a83af183796ef3be19dfd6b8945512
- Trigger Event: release

File details

Details for the file zarrio-0.1.1-py3-none-any.whl.

File metadata

Download URL: zarrio-0.1.1-py3-none-any.whl
Upload date: Nov 6, 2025
Size: 36.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zarrio-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31590fd554ce0e604a9e08bea3fd3c9040af9227079218c0b26e49820aa5a273`
MD5	`c8138fde56ab97babfc18d49ce03af7b`
BLAKE2b-256	`14178e4dbde8c436f8e268a5bc9ea8d34b8285d86dc078e51ed96110d3a6d307`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zarrio-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on oceanum/zarrio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zarrio-0.1.1-py3-none-any.whl
- Subject digest: 31590fd554ce0e604a9e08bea3fd3c9040af9227079218c0b26e49820aa5a273
- Sigstore transparency entry: 674942844
- Sigstore integration time: Nov 6, 2025
Source repository:
- Permalink: oceanum/zarrio@d7c7fe2c98a83af183796ef3be19dfd6b8945512
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/oceanum
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d7c7fe2c98a83af183796ef3be19dfd6b8945512
- Trigger Event: release

zarrio 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

zarrio

Overview

Features

Installation

Usage

Command Line Interface

Python API

Parallel Writing

Configuration

Development

Installation

Running Tests

Code Quality

License

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance