A convenience wrapper around PyIceberg for simplified data loading into Apache Iceberg tables
Project description
iceberg-loader
A convenience wrapper around PyIceberg that simplifies data loading into Apache Iceberg tables. PyArrow-first, handles messy JSON, schema evolution, idempotent replace, upsert, batching, and streaming out of the box.
Status: Actively developed and under testing. PRs are welcome!
Currently tested against Hive Metastore; REST Catalog support is planned.
Why iceberg-loader?
- Messy JSON friendly: auto-serializes dict/list/mixed fields to strings so writes don't fail.
- Schema evolution: add columns on the fly (opt-in), preserves field IDs.
- Safe writes: append/overwrite, idempotent replace via
replace_filter, upsert. - Stream friendly: commit intervals, batches, IPC streams.
- Single config:
LoaderConfigsets defaults; override per-call if needed.
Install
pip install "iceberg-loader[all]"
Or with uv:
uv pip install "iceberg-loader[all]"
Quickstart
import pyarrow as pa
from pyiceberg.catalog import load_catalog
from iceberg_loader import LoaderConfig, load_data_to_iceberg
catalog = load_catalog("default")
data = pa.Table.from_pydict({"id": [1, 2], "signup_date": ["2023-01-01", "2023-01-02"]})
config = LoaderConfig(write_mode="append", partition_col="signup_date", schema_evolution=True)
load_data_to_iceberg(data, ("db", "users"), catalog, config=config)
Documentation
Full usage guide, API reference, and examples: docs/ or run mkdocs serve locally.
Contributing
We welcome contributions! See CONTRIBUTING.md for setup, coding style, and PR guidelines.
hatch run lint
hatch run test
Contributors
Thanks to all contributors who have helped make this project better!
Made with contrib.rocks.
License
iceberg-loader is distributed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iceberg_loader-0.0.5.tar.gz.
File metadata
- Download URL: iceberg_loader-0.0.5.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fbf3bd9396e11b72ad3ed581fe1a9359b3c126e3980b68d325120f503d1f07b
|
|
| MD5 |
d0baee614ba24c6aa32d9f941989002e
|
|
| BLAKE2b-256 |
6ac279bc186f5526e4b43c6787f896455f4cdeedd05d3676a508e3df019d81cb
|
File details
Details for the file iceberg_loader-0.0.5-py3-none-any.whl.
File metadata
- Download URL: iceberg_loader-0.0.5-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
991b11f507277917df1bd48105161dc4d9f555bcf981b329373aa631212881ab
|
|
| MD5 |
a0a66b62ffbc7df566408d85b185b8d2
|
|
| BLAKE2b-256 |
56d00c0802792b9992e495daa1652e58bb8d03255d17ce649ac9182ed0c65ef1
|