Skip to main content

Python bindings to libcozip, the reference writer for the Cloud-Optimized ZIP (cozip) format.

Project description

cozip

License MIT PyPI R Julia Spec


Open a ZIP like a table. Still a ZIP, now queryable.

cozip glues a Parquet manifest onto an ordinary ZIP. The manifest has one row per entry (name, offset, size, plus any columns you tag onto it). Fetch the index, fetch the manifest, query it locally, then range-request just the bytes you actually want. A 20 GB archive becomes a queryable dataset in two reads.

how cozip works

It works because nothing about the ZIP changes. unzip works. zipfile.ZipFile works. Your OS preview pane works. The manifest is just the first entry, and any conforming ZIP reader walks right past it.

Two functions

write packs files plus metadata into a cozip. read returns the manifest. That is the whole surface area in every binding.

The write manifest reserves two columns. path is where each file lives on disk, consumed at write time and dropped from the manifest. name is how it is stored inside the archive. write then adds two more columns to the manifest, offset and size, holding the byte offset and length of each file in the ZIP.

Everything else rides along and is queryable on read. Local file or remote URL, same call.

Python

import cozip
import pyarrow as pa

table = pa.table({
    "path":  ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    "name":  ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    "split": ["train", "val", "train"],
    "label": ["cloud", "water", "forest"],
})
cozip.write("dataset.zip", table)

manifest = cozip.read("https://example.com/dataset.zip")
train = manifest.filter(pa.compute.equal(manifest["split"], "train"))

R

library(cozip)
library(arrow)

tbl <- arrow_table(
  path  = c("local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"),
  name  = c("tile_001.tif", "tile_002.tif", "tile_003.tif"),
  split = c("train", "val", "train"),
  label = c("cloud", "water", "forest")
)
cozip_write("dataset.zip", tbl)

manifest <- cozip_read("https://example.com/dataset.zip")
train <- manifest |> dplyr::filter(split == "train")

Julia

using Cozip
using DataFrames

df = DataFrame(
    path  = ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
    name  = ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
    split = ["train", "val", "train"],
    label = ["cloud", "water", "forest"],
)
Cozip.write("dataset.zip", df)

manifest = Cozip.read("https://example.com/dataset.zip")
train = filter(:split => ==("train"), manifest)

Bindings

Language Read Write Install
Python pip install cozip
R install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev")
Julia Pkg.Registry.add("https://github.com/asterisk-labs/AsteriskRegistry"); Pkg.add("Cozip")

Every binding wraps the same C core, so a cozip written by R reads byte for byte identically in Julia, in Python, in C. The high-level API is uniform across runtimes. Python and R speak Apache Arrow tables; Julia speaks Tables.jl-compatible DataFrames.

Spec

See SPEC.md. The format is short and stable. Any conforming reader handles any conforming writer.

License

MIT.


Made with ♥ by

Asterisk Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cozip-2026.5.15.tar.gz (15.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cozip-2026.5.15-py3-none-win_amd64.whl (98.0 kB view details)

Uploaded Python 3Windows x86-64

cozip-2026.5.15-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (123.2 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

cozip-2026.5.15-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (123.7 kB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

cozip-2026.5.15-py3-none-macosx_11_0_universal2.whl (182.6 kB view details)

Uploaded Python 3macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file cozip-2026.5.15.tar.gz.

File metadata

  • Download URL: cozip-2026.5.15.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cozip-2026.5.15.tar.gz
Algorithm Hash digest
SHA256 70fc0c86da611c3e38caf8104b5a979c30a67e7c462a979a7d6b6c1374868d8c
MD5 fafa66ca5f977e77ec46548d54ea507e
BLAKE2b-256 0fb0fd3ddf57c785c9fad0503cadeff7d4c179f3b45db506b1c10f5445ee1735

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.15.tar.gz:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.15-py3-none-win_amd64.whl.

File metadata

  • Download URL: cozip-2026.5.15-py3-none-win_amd64.whl
  • Upload date:
  • Size: 98.0 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cozip-2026.5.15-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 7d8376287c2828058e81ae3428d8685134aa1cc5e96cb446d6e158a5a0cce470
MD5 7a33ee9fad9571215e5655db2039bedf
BLAKE2b-256 5e7d34b2fdf22786aa146e78eb99c6a5529045afed971d30cf7cb48e0a095e51

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.15-py3-none-win_amd64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.15-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cozip-2026.5.15-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c2074119407345378e7450d512f3912e5925457e55e909cf244b0e4525fb8d1c
MD5 789bb92ece8db6c001fb82ddc4c5cc6b
BLAKE2b-256 9ce8a3031e8abe82e4fbb3261ab4982b053f6dec2c1fed74320a5ef7420035ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.15-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.15-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cozip-2026.5.15-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 88e56210cf3d24d2b8c310218e0db758b4cd8031cef7f8b7780684c46a790d78
MD5 497ce487ff8310ee87ca513ac5a1f76d
BLAKE2b-256 9b72f2aaf478da8cb7c49c0b2ae84c57fea6841388af5ab47157deb1b6f01c67

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.15-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cozip-2026.5.15-py3-none-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for cozip-2026.5.15-py3-none-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 7b5868df744580b1dfe4a1f8a591768de96e1323ab48531fa6e38aa6fcb87ea6
MD5 f8407ed0174433e2655c45641ea1389e
BLAKE2b-256 472629c86f8c2a770e0065fb997fef4062d8f2b64307a3256f072209b17671b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for cozip-2026.5.15-py3-none-macosx_11_0_universal2.whl:

Publisher: release.yml on asterisk-labs/cozip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page