Convert compressed JSONL archives (gzip, zstd, bzip2, lz4) to PFC format — local, S3, Azure, GCS
Project description
pfc-migrate — Convert existing JSONL archives to PFC format
You already have logs stored somewhere — gzip, zstd, bzip2, or lz4 — on disk, on S3, in Glacier, on Azure, or on GCS. Switching to PFC-JSONL means converting them once. This tool does that, wherever your files live.
Storage Backends
| Backend | Status | Command |
|---|---|---|
| Local filesystem | ✅ v0.1.0 | pfc-migrate convert |
| Amazon S3 / S3 Glacier | ✅ v0.2.0 | pfc-migrate s3 / pfc-migrate glacier |
| Azure Blob Storage | ✅ v0.3.0 | pfc-migrate azure |
| Google Cloud Storage | ✅ v0.3.0 | pfc-migrate gcs |
Why convert?
Once your archives are in PFC format, DuckDB can query them directly — without decompressing the whole file first:
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Query just one hour from a 30-day archive
SELECT line->>'$.level' AS level, line->>'$.message' AS message
FROM read_pfc_jsonl(
'/var/log/pfc/app_2026-03-01.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-03-01 14:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-03-01 15:00:00+00')
);
| Tool | 1h query on 30-day archive | Storage savings vs gzip |
|---|---|---|
| gzip | Decompress full 30-day file | — |
| zstd | Decompress full 30-day file | — |
| PFC-JSONL | Decompress ~1/720 of the file | 25% smaller than gzip |
~9% compression ratio on typical JSONL log data (25% smaller than gzip, 37% smaller than zstd).
Zero egress cost
Cloud conversions run in-region: download → convert → upload, without ever routing through your laptop or billing you for egress. For Glacier, paying the one-time retrieval cost to convert is worth it — smaller PFC files mean lower future storage, retrieval, and query costs forever.
Input Formats
| Format | Extension | Extra dependency |
|---|---|---|
| gzip | .jsonl.gz |
stdlib ✅ |
| bzip2 | .jsonl.bz2 |
stdlib ✅ |
| zstd | .jsonl.zst |
pip install pfc-migrate[zstd] |
| lz4 | .jsonl.lz4 |
pip install pfc-migrate[lz4] |
| Plain JSONL | .jsonl |
stdlib ✅ |
Requirements
The pfc_jsonl binary must be installed on the machine doing the conversion:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
Platform: Linux x86_64. macOS binary coming soon.
Install
pip install pfc-migrate
# With zstd support
pip install pfc-migrate[zstd]
# With S3/Glacier support
pip install pfc-migrate[s3]
# With Azure Blob Storage support
pip install pfc-migrate[azure]
# With Google Cloud Storage support
pip install pfc-migrate[gcs]
# Everything
pip install pfc-migrate[all]
Usage — Local filesystem
# Single file (output auto-named: logs.pfc + logs.pfc.bidx)
pfc-migrate convert logs.jsonl.gz
# Explicit output
pfc-migrate convert logs.jsonl.gz logs.pfc
# Entire directory
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/
# Recursive + verbose
pfc-migrate convert --dir /mnt/logs/ -r -v
# Delete originals after successful conversion
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/ --delete
Verbose output:
Found 12 file(s) to convert
→ app_2026-03-01.jsonl.gz [gz]
145.3 MB → 13.1 MB (9.0% of original) ✓ app_2026-03-01.pfc
→ app_2026-03-02.jsonl.gz [gz]
138.7 MB → 12.4 MB (8.9% of original) ✓ app_2026-03-02.pfc
...
Done: 12 converted, 0 failed
Input : 1721.4 MB (decompressed)
Output : 154.9 MB
Ratio : 9.0%
Usage — Amazon S3 / S3 Glacier
Conversion happens in-region (download to temp dir → convert → upload). No egress charges.
# Single object
pfc-migrate s3 \
--bucket my-logs \
--key archive/app_2026-03.jsonl.gz \
--out-bucket my-logs-pfc \
--out-prefix converted/
# All objects matching a prefix
pfc-migrate s3 \
--bucket my-logs \
--prefix archive/2026-03/ \
--out-bucket my-logs-pfc \
--out-prefix converted/2026-03/ \
--format gz \
--verbose
# With explicit credentials
pfc-migrate s3 \
--bucket my-logs --prefix archive/ \
--out-bucket my-logs-pfc --out-prefix pfc/ \
--access-key AKIA... --secret-key ... --region us-east-1
# Glacier (Expedited retrieval, converts and re-archives)
pfc-migrate glacier \
--bucket my-glacier-logs \
--prefix 2025/ \
--out-bucket my-glacier-pfc \
--retrieval-tier Expedited
Credentials are read from ~/.aws/credentials, environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), or passed explicitly via --access-key / --secret-key.
Usage — Azure Blob Storage
# Single blob (connection string)
pfc-migrate azure \
--container my-logs \
--blob archive/app_2026-03.jsonl.gz \
--out-container my-logs-pfc \
--out-prefix converted/ \
--connection-string "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"
# All blobs matching a prefix
pfc-migrate azure \
--container my-logs \
--prefix archive/2026-03/ \
--out-container my-logs-pfc \
--out-prefix converted/2026-03/ \
--format gz \
--connection-string "..."
# Using account URL + DefaultAzureCredential (managed identity, az login, etc.)
pfc-migrate azure \
--container my-logs \
--prefix archive/ \
--out-container my-logs-pfc \
--account-url https://myaccount.blob.core.windows.net
--connection-string or --account-url required. With --account-url, DefaultAzureCredential is used automatically (managed identity, az login, environment variables).
Usage — Google Cloud Storage
# Single object
pfc-migrate gcs \
--bucket my-logs \
--blob archive/app_2026-03.jsonl.gz \
--out-bucket my-logs-pfc \
--out-prefix converted/
# All objects matching a prefix
pfc-migrate gcs \
--bucket my-logs \
--prefix archive/2026-03/ \
--out-bucket my-logs-pfc \
--out-prefix converted/2026-03/ \
--format gz \
--verbose
Credentials from GOOGLE_APPLICATION_CREDENTIALS, gcloud auth application-default login, or Workload Identity on GKE.
Lossless guarantee
Every conversion is verified by full decompression and MD5 check before the output is written. If anything doesn't match, the output file is deleted and the error is reported — the original is never modified unless --delete is explicitly passed.
The full test suite verifies 4 backends × 4 formats with:
- Sorted MD5 of all 400,000 lines (identical to original ✅)
- DuckDB random access: 39,743 lines in a time window, MD5 identical to original ✅
Related Projects
| Project | Description |
|---|---|
| pfc-jsonl | Core binary — compress, decompress, query |
| pfc-duckdb | DuckDB Community Extension (INSTALL pfc FROM community) |
| pfc-fluentbit | Fluent Bit → PFC forwarder for live pipelines |
License
MIT — see LICENSE.
Built by ImpossibleForge
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pfc_migrate-0.3.0.tar.gz.
File metadata
- Download URL: pfc_migrate-0.3.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6836039e8fcdb3ac34132982c39951255aa7f06a5db58bfb59b8d28534ad9db
|
|
| MD5 |
605b87e8db87407d744b9d4b908d443f
|
|
| BLAKE2b-256 |
c05fa14e8c9e31a8be8f18ebc8d135f36d51b27c8d2482cd2a76bf2b9396aff9
|
File details
Details for the file pfc_migrate-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pfc_migrate-0.3.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0434953a3aeb128db93c4350c4dd7ec92b863efea94f3bc585707c1e65f90c9
|
|
| MD5 |
44547b7fe8e55bc1a1d49183c6835682
|
|
| BLAKE2b-256 |
3e4b922c6c0dd48063c964a06c8e3ba229a1eedf4dab211cc67d1e44cd7bb46b
|