A DuckDB-based Iceberg catalog implementation
Project description
boringdata.io — Kickstart your Iceberg journey with our data stack templates.
Boring Catalog
A lightweight, file-based Iceberg catalog implementation using a single JSON file (e.g., on S3, local disk, or any fsspec-compatible storage).
Why Boring Catalog?
- No need to host or maintain a dedicated catalog service
- Easy to use, easy to understand, perfect to get started with Iceberg
- DuckDB CLI interface to easily explore your iceberg tables and metadata
How It Works
Boring Catalog stores all Iceberg catalog state in a single JSON file:
- Namespaces and tables are tracked in this file
- S3 conditional writes prevent concurrent modifications when storing catalog on S3
- The
.ice/indexfile in your project directory stores the configuration for your catalog, including:catalog_uri: the path to your catalog JSON filecatalog_name: the logical name of your catalogproperties: additional properties (e.g., warehouse location)
Installation
pip install boringcatalog
Quickstart
Initialize a Catalog
ice init
That's it ! Your catalog is now ready to use.
2 files are created:
warehouse/catalog/catalog_boring.json= catalog file.ice/index= points to the catalog location (similar to a git index file, but for Iceberg)
Note: You can also specify a remote location for your Iceberg data and catalog file:
ice init -p warehouse=s3://mybucket/mywarehouse
More details on the Custom Init and Catalog Location section.
Note: If you are using an S3 path (e.g., s3://...) for your catalog file or warehouse, make sure your CLI environment is authenticated with AWS. For example, you can set your AWS profile with:
export AWS_PROFILE=your-provider
You must have valid AWS credentials configured for the CLI to access S3 resources.
You can then start using the catalog:
Commit a table
# Get some data
curl https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet -o /tmp/yellow_tripdata_2023-01.parquet
# Commit the table
ice commit my_table --source /tmp/yellow_tripdata_2023-01.parquet
Check the commit history:
ice log
Explore your Iceberg (data and metadata) with DuckDB
ice duck
This opens an interactive DuckDB session with pointers to all your tables and namespaces.
Example DuckDB queries:
show; -- show all tables
select * from catalog.namespaces; -- list namespaces
select * from catalog.tables; -- list tables
select * from <namespace>.<table>; -- query iceberg table
Python Usage
from boringcatalog import BoringCatalog
# Auto-detects .ice/index in the current working directory
catalog = BoringCatalog()
# Or specify a catalog
catalog = BoringCatalog(name="mycat", uri="path/to/catalog.json")
# Interact with your iceberg catalog
catalog.create_namespace("my_namespace")
catalog.create_table("my_namespace", "my_table")
catalog.load_table("my_namespace.my_table")
import pyarrow.parquet as pq
df = pq.read_table("/tmp/yellow_tripdata_2023-01.parquet")
table = catalog.load_table(("ice_default", "my_table"))
table.append(df)
Custom Init and Catalog Location
You can configure your Iceberg catalog in several ways, depending on where you want to store your catalog metadata (the JSON file) and your Iceberg data (the warehouse):
- The
warehouseproperty determines where your Iceberg tables' data will be stored. - The
--catalogoption lets you specify the exact path for your catalog JSON file. - If you use both, the catalog file will be created at the path you specify, and the warehouse will be used for table data.
Examples
| Command Example | Catalog File Location | Warehouse/Data Location | Use Case |
|---|---|---|---|
ice init |
warehouse/catalog/catalog_boring.json |
warehouse/ |
Local, simple |
ice init -p warehouse=... |
<warehouse>/catalog/catalog_boring.json |
<warehouse>/ |
Custom warehouse |
ice init --catalog ... |
<custom>.json |
(to define when creating a table) | Custom catalog file |
ice init --catalog ... -p warehouse=... |
<custom>.json |
<warehouse>/ |
Full control |
ice init --catalog ... --catalog-name ... |
<custom>.json |
(to define when creating a table) | Custom name & file |
Edge Cases & Manual Editing
- Custom Catalog Name: By default, the catalog is named
"boring", but you can set a custom name with--catalog-name. This name is used in the catalog JSON and for file naming if you don't specify a custom path. - Re-initialization: If you run
ice initmultiple times in the same directory, the.ice/indexfile will be overwritten with the new configuration. This is useful if you want to re-point your project to a different catalog, but be aware that it will not migrate or merge any existing data. - Manual Editing: Advanced users can manually edit
.ice/indexto point to a different catalog file or change the catalog name. If you do this, make sure thecatalog_uriandcatalog_namefields are consistent with your actual catalog JSON file. If you set awarehouseproperty but do not updatecatalog_uri, Boring Catalog will always use thecatalog_urifrom the index file.
Roadmap
- Improve CLI to allow MERGE operation, partition spec, etc.
- Improve CLI to get info about table schema / partition spec / etc.
- Expose REST API for integration with AWS, Snowflake, etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boringcatalog-0.4.0.tar.gz.
File metadata
- Download URL: boringcatalog-0.4.0.tar.gz
- Upload date:
- Size: 521.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d3dad2e3bcd35311f1070c1cb4a0efdb8ff44078d3f7fdd06b4a14050fb24cf
|
|
| MD5 |
95b1bb47d389cfc1e58a59d82349f557
|
|
| BLAKE2b-256 |
9f208f68a1986815e824e6b75031c02c3bc81e355cda9772b686c3a5c4c58ac3
|
File details
Details for the file boringcatalog-0.4.0-py3-none-any.whl.
File metadata
- Download URL: boringcatalog-0.4.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32eb8f90327d7482a4c850b52bb3537160146aad91f21a1ecf0aab4fd949af2c
|
|
| MD5 |
ac78046a3512a4eb0f05afdd3ff69b1b
|
|
| BLAKE2b-256 |
ec56317210faaded9ef9d3f71f0bbe3e98516edb8f419e6ee327eb1157c0f812
|