A DuckDB-based Iceberg catalog implementation
Project description
Boring Catalog
A DuckDB-based Iceberg catalog implementation.
The catalog is stored as a single .duckdb file in S3, making it lightweight and portable.
Why Boring Catalog?
- Eliminates the need to host or maintain a dedicated catalog service
- We can store all our Iceberg metadata in a single DuckDB file, including:
- Catalog metadata
- Pointers to Iceberg metadata files (via
read_json('s3://...')) - References to Iceberg table data (via
scan_iceberg('s3://...'))
- Enables easy sharing across teams and environments through simple S3 URLs using
ATTACH '<s3_url>' - We can easily expose a FastAPI REST endpoint to enable writes from Snowflake and other external systems
How It Works
Boring Catalog uses S3 conditional PUT operations to synchronize the catalog across multiple clients, effectively preventing race conditions during concurrent access.
Installation
pip install boringcatalog
Usage
Create namespace and table
from boringcatalog import BoringCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import LongType, StringType, DecimalType
from pyiceberg.schema import NestedField
catalog = BoringCatalog(
"my_catalog",
warehouse="s3://{your-bucket}/boringcatalog"
)
if ("my_namespace",) not in catalog.list_namespaces():
catalog.create_namespace("my_namespace")
schema = Schema(
NestedField(1, "id", LongType(), required=True),
NestedField(2, "data", StringType()),
NestedField(3, "amount", DecimalType(5, 1))
)
if ("my_namespace", "my_table_2") not in catalog.list_tables():
table = catalog.create_table(
identifier=("my_namespace", "my_table_2"),
schema=schema,
properties={"write.format.default": "parquet"}
)
Append data
from boringcatalog import BoringCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import LongType, StringType, DecimalType
from pyiceberg.schema import NestedField
catalog = BoringCatalog(
"my_catalog",
warehouse="s3://{your-bucket}/boringcatalog"
)
table = catalog.load_table(("my_namespace", "my_table_2"))
dummy_data = pd.DataFrame({
"id": pd.Series(range(1, 10001), dtype="Int32"),
"data": [f"Transaction_{i}" for i in range(1, 10001)],
"amount": [Decimal(str(min(i * 10.5, 9999.9))).quantize(Decimal('0.1')) for i in range(1, 10001)]
})
arrow_table = pa.Table.from_pandas(
dummy_data,
schema=pa.schema([
('id', pa.int32(), False),
('data', pa.string(), True),
('amount', pa.decimal128(5, 1), True)
]),
safe=True
)
table.append(arrow_table)
Next steps: [] Reflect tables in the catalog (CREATE VIEW AS SELECT * FROM READ_ICEBERG()) [] Reflect snapshots in a catalog table (CREATE TABLE snapshots as read_json()) [] Improve performance (sync of .duckdb from local to s3 takes too long) [] Add fastAPI on top of the catalog to allow write from Snowflake and other clients
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boringcatalog-0.1.2.tar.gz.
File metadata
- Download URL: boringcatalog-0.1.2.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.27
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2ecd406bf90a519f251591569ee66bee8dee9bdfa81647975d2b7c72ec44e5f
|
|
| MD5 |
cff7459a6151dbefde74bab79eb7b2ee
|
|
| BLAKE2b-256 |
ee825a799c0edbf686a72c30831fdc71d0dade4c503637f6abd0d75d5e92dc55
|
File details
Details for the file boringcatalog-0.1.2-py3-none-any.whl.
File metadata
- Download URL: boringcatalog-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.27
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dd6aa0b7665ef05def3bc487a98b60e9f84ae75d31e3f4a05b3a26aa146da03
|
|
| MD5 |
c3034c0b17da1f9190ffc395651a3141
|
|
| BLAKE2b-256 |
f6363f284b55df7f051284f40571ae87fb87583405add14462ab43ce4ddbb6e2
|