Skip to main content

Python bindings to libcozip, the reference writer for the Cloud-Optimized ZIP (cozip) format.

Project description

cozip

License MIT PyPI R Julia Spec


Open a ZIP like a table. Still a ZIP, now queryable.

cozip glues a Parquet manifest onto an ordinary ZIP. The manifest has one row per entry (name, offset, size, plus any columns you tag onto it). Fetch the index, fetch the manifest, query it locally, then range-request just the bytes you actually want. A 20 GB archive becomes a queryable dataset in two reads.

how cozip works

It works because nothing about the ZIP changes. unzip works. zipfile.ZipFile works. Your OS preview pane works. The manifest is just the first entry, and any conforming ZIP reader walks right past it.

Two functions

write packs files plus metadata into a cozip. read returns the manifest. That is the whole surface area in every binding.

The write manifest reserves two columns. path is where each file lives on disk, consumed at write time and dropped from the manifest. name is how it is stored inside the archive. write then adds two more columns to the manifest, offset and size, holding the byte offset and length of each file in the ZIP.

Everything else rides along and is queryable on read. Local file or remote URL, same call.

Python

import cozip
import pyarrow as pa

table = pa.table({
    "path":  ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    "name":  ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    "split": ["train", "val", "train"],
    "label": ["cloud", "water", "forest"],
})
cozip.write("dataset.zip", table)

manifest = cozip.read("https://example.com/dataset.zip")
train = manifest.filter(pa.compute.equal(manifest["split"], "train"))

R

library(cozip)
library(arrow)

tbl <- arrow_table(
  path  = c("local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"),
  name  = c("tile_001.tif", "tile_002.tif", "tile_003.tif"),
  split = c("train", "val", "train"),
  label = c("cloud", "water", "forest")
)
cozip_write("dataset.zip", tbl)

manifest <- cozip_read("https://example.com/dataset.zip")
train <- manifest |> dplyr::filter(split == "train")

Julia

using Cozip
using DataFrames

df = DataFrame(
    path  = ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    name  = ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    split = ["train", "val", "train"],
    label = ["cloud", "water", "forest"],
)
Cozip.write("dataset.zip", df)

manifest = Cozip.read("https://example.com/dataset.zip")
train = filter(:split => ==("train"), manifest)

Bindings

Language Read Write Install
Python pip install cozip
R install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev")
Julia Pkg.Registry.add("https://github.com/asterisk-labs/AsteriskRegistry"); Pkg.add("Cozip")

Every binding wraps the same C core, so a cozip written by R reads byte for byte identically in Julia, in Python, in C. The high-level API is uniform across runtimes. Python and R speak Apache Arrow tables; Julia speaks Tables.jl-compatible DataFrames.

Spec

See SPEC.md. The format is short and stable. Any conforming reader handles any conforming writer.

License

MIT.


Made with ♥ by

Asterisk Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cozip-2026.5.16.tar.gz (15.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cozip-2026.5.16-py3-none-win_amd64.whl (98.0 kB view details)

Uploaded Python 3Windows x86-64

cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (123.2 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (123.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl (182.6 kB view details)

Uploaded Python 3macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file cozip-2026.5.16.tar.gz.

File metadata

  • Download URL: cozip-2026.5.16.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cozip-2026.5.16.tar.gz
Algorithm Hash digest
SHA256 e68a278ea4fde5347530a1b74b1fbe178b14c683d08041737947e6b1a460e88f
MD5 a83d1a7c4cc4c8224c62cb9feffa51bd
BLAKE2b-256 bc8c43101eaa8abf137474a72745e910f13f7a1b65322b97cd2d3806a957591c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.16.tar.gz:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.16-py3-none-win_amd64.whl.

File metadata

  • Download URL: cozip-2026.5.16-py3-none-win_amd64.whl
  • Upload date:
  • Size: 98.0 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cozip-2026.5.16-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 624b706a690a7daa962e76ca97a7c2dfc47dc0d689f8fe0e8fcba9e4eaa23805
MD5 5beed4d29e1b1f1d29bebb473c673e07
BLAKE2b-256 b050988ee9b4d31d06743f5d5536edb56c1bf101d6e020d3899ad6f6fa0b3729

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.16-py3-none-win_amd64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7232a49ebc6c2c1bb2877c0434bf2e1bc0307ffc0d7129bf34f51cca65b2ef8e
MD5 0949ac8c5b642f0abeabe4a610ce7517
BLAKE2b-256 417a68e4df98b7c36a76dd43ebb83579ef19a248f37f504e29ba0217af7afc64

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ec50c002eb80f03ee65bfe0011a8109d93feee7f997ca2d675b990948d58d0b3
MD5 05fff510d72af69723f530fce7aa9370
BLAKE2b-256 5d402d129395afc6734efc53d297cb2fef357a1bc514f33e95c8bc05948ffb6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 11bea808e58bc88adf0f59f9e4f1f5a4f0bd555ca061a4eaee6c98b4e8ffc560
MD5 b7a6717943f2d134fafd2de725b70125
BLAKE2b-256 dc00705d61cd1e9248daa4a41ad5924e8069b773f1bc956e8bd9ff60bd171ef2

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page