Python bindings to libcozip, the reference writer for the Cloud-Optimized ZIP (cozip) format.
Project description
Open a ZIP like a table. Still a ZIP, now queryable.
cozip glues a Parquet manifest onto an ordinary ZIP. The manifest has one row per entry (name, offset, size, plus any columns you tag onto it). Fetch the index, fetch the manifest, query it locally, then range-request just the bytes you actually want. A 20 GB archive becomes a queryable dataset in two reads.
It works because nothing about the ZIP changes. unzip works. zipfile.ZipFile works. Your OS preview pane works. The manifest is just the first entry, and any conforming ZIP reader walks right past it.
Two functions
write packs files plus metadata into a cozip. read returns the manifest. That is the whole surface area in every binding.
The write manifest reserves two columns. path is where each file lives on disk, consumed at write time and dropped from the manifest. name is how it is stored inside the archive. write then adds two more columns to the manifest, offset and size, holding the byte offset and length of each file in the ZIP.
Everything else rides along and is queryable on read. Local file or remote URL, same call.
Python
import cozip
import pyarrow as pa
table = pa.table({
"path": ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
"name": ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
"split": ["train", "val", "train"],
"label": ["cloud", "water", "forest"],
})
cozip.write("dataset.zip", table)
manifest = cozip.read("https://example.com/dataset.zip")
train = manifest.filter(pa.compute.equal(manifest["split"], "train"))
R
library(cozip)
library(arrow)
tbl <- arrow_table(
path = c("local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"),
name = c("tile_001.tif", "tile_002.tif", "tile_003.tif"),
split = c("train", "val", "train"),
label = c("cloud", "water", "forest")
)
cozip_write("dataset.zip", tbl)
manifest <- cozip_read("https://example.com/dataset.zip")
train <- manifest |> dplyr::filter(split == "train")
Julia
using Cozip
using DataFrames
df = DataFrame(
path = ["local/tile_001.tif", "local/tile_002.tif", "local/tile_003.tif"],
name = ["tile_001.tif", "tile_002.tif", "tile_003.tif"],
split = ["train", "val", "train"],
label = ["cloud", "water", "forest"],
)
Cozip.write("dataset.zip", df)
manifest = Cozip.read("https://example.com/dataset.zip")
train = filter(:split => ==("train"), manifest)
Bindings
| Language | Read | Write | Install |
|---|---|---|---|
| Python | ✓ | ✓ | pip install cozip |
| R | ✓ | ✓ | install.packages("cozip", repos = "https://asterisk-labs.r-universe.dev") |
| Julia | ✓ | ✓ | Pkg.Registry.add("https://github.com/asterisk-labs/AsteriskRegistry"); Pkg.add("Cozip") |
Every binding wraps the same C core, so a cozip written by R reads byte for byte identically in Julia, in Python, in C. The high-level API is uniform across runtimes. Python and R speak Apache Arrow tables; Julia speaks Tables.jl-compatible DataFrames.
Spec
See SPEC.md. The format is short and stable. Any conforming reader handles any conforming writer.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cozip-2026.5.16.tar.gz.
File metadata
- Download URL: cozip-2026.5.16.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e68a278ea4fde5347530a1b74b1fbe178b14c683d08041737947e6b1a460e88f
|
|
| MD5 |
a83d1a7c4cc4c8224c62cb9feffa51bd
|
|
| BLAKE2b-256 |
bc8c43101eaa8abf137474a72745e910f13f7a1b65322b97cd2d3806a957591c
|
Provenance
The following attestation bundles were made for cozip-2026.5.16.tar.gz:
Publisher:
release.yml on asterisk-labs/cozip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cozip-2026.5.16.tar.gz -
Subject digest:
e68a278ea4fde5347530a1b74b1fbe178b14c683d08041737947e6b1a460e88f - Sigstore transparency entry: 1554660112
- Sigstore integration time:
-
Permalink:
asterisk-labs/cozip@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Branch / Tag:
refs/tags/v2026.5.16 - Owner: https://github.com/asterisk-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cozip-2026.5.16-py3-none-win_amd64.whl.
File metadata
- Download URL: cozip-2026.5.16-py3-none-win_amd64.whl
- Upload date:
- Size: 98.0 kB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
624b706a690a7daa962e76ca97a7c2dfc47dc0d689f8fe0e8fcba9e4eaa23805
|
|
| MD5 |
5beed4d29e1b1f1d29bebb473c673e07
|
|
| BLAKE2b-256 |
b050988ee9b4d31d06743f5d5536edb56c1bf101d6e020d3899ad6f6fa0b3729
|
Provenance
The following attestation bundles were made for cozip-2026.5.16-py3-none-win_amd64.whl:
Publisher:
release.yml on asterisk-labs/cozip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cozip-2026.5.16-py3-none-win_amd64.whl -
Subject digest:
624b706a690a7daa962e76ca97a7c2dfc47dc0d689f8fe0e8fcba9e4eaa23805 - Sigstore transparency entry: 1554660230
- Sigstore integration time:
-
Permalink:
asterisk-labs/cozip@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Branch / Tag:
refs/tags/v2026.5.16 - Owner: https://github.com/asterisk-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 123.2 kB
- Tags: Python 3, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7232a49ebc6c2c1bb2877c0434bf2e1bc0307ffc0d7129bf34f51cca65b2ef8e
|
|
| MD5 |
0949ac8c5b642f0abeabe4a610ce7517
|
|
| BLAKE2b-256 |
417a68e4df98b7c36a76dd43ebb83579ef19a248f37f504e29ba0217af7afc64
|
Provenance
The following attestation bundles were made for cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
release.yml on asterisk-labs/cozip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cozip-2026.5.16-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
7232a49ebc6c2c1bb2877c0434bf2e1bc0307ffc0d7129bf34f51cca65b2ef8e - Sigstore transparency entry: 1554660312
- Sigstore integration time:
-
Permalink:
asterisk-labs/cozip@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Branch / Tag:
refs/tags/v2026.5.16 - Owner: https://github.com/asterisk-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.
File metadata
- Download URL: cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
- Upload date:
- Size: 123.7 kB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec50c002eb80f03ee65bfe0011a8109d93feee7f997ca2d675b990948d58d0b3
|
|
| MD5 |
05fff510d72af69723f530fce7aa9370
|
|
| BLAKE2b-256 |
5d402d129395afc6734efc53d297cb2fef357a1bc514f33e95c8bc05948ffb6c
|
Provenance
The following attestation bundles were made for cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:
Publisher:
release.yml on asterisk-labs/cozip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cozip-2026.5.16-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl -
Subject digest:
ec50c002eb80f03ee65bfe0011a8109d93feee7f997ca2d675b990948d58d0b3 - Sigstore transparency entry: 1554660266
- Sigstore integration time:
-
Permalink:
asterisk-labs/cozip@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Branch / Tag:
refs/tags/v2026.5.16 - Owner: https://github.com/asterisk-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl.
File metadata
- Download URL: cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl
- Upload date:
- Size: 182.6 kB
- Tags: Python 3, macOS 11.0+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11bea808e58bc88adf0f59f9e4f1f5a4f0bd555ca061a4eaee6c98b4e8ffc560
|
|
| MD5 |
b7a6717943f2d134fafd2de725b70125
|
|
| BLAKE2b-256 |
dc00705d61cd1e9248daa4a41ad5924e8069b773f1bc956e8bd9ff60bd171ef2
|
Provenance
The following attestation bundles were made for cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl:
Publisher:
release.yml on asterisk-labs/cozip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cozip-2026.5.16-py3-none-macosx_11_0_universal2.whl -
Subject digest:
11bea808e58bc88adf0f59f9e4f1f5a4f0bd555ca061a4eaee6c98b4e8ffc560 - Sigstore transparency entry: 1554660184
- Sigstore integration time:
-
Permalink:
asterisk-labs/cozip@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Branch / Tag:
refs/tags/v2026.5.16 - Owner: https://github.com/asterisk-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0485bdf12f168d4d6f10a299c1d69ebd3cecde55 -
Trigger Event:
push
-
Statement type: