A convenience wrapper around PyIceberg for simplified data loading into Apache Iceberg tables
Project description
iceberg-loader
A convenience wrapper around PyIceberg that simplifies data loading into Apache Iceberg tables. PyArrow-first, handles messy JSON, schema evolution, idempotent replace, upsert, batching, and streaming out of the box.
Status: Actively developed and under testing. PRs are welcome! Currently tested against Hive Metastore; REST Catalog support is planned.
Why iceberg-loader?
- Messy JSON friendly: auto-serializes dict/list/mixed fields to strings so writes don't fail.
- Schema evolution: add columns on the fly (opt-in), preserves field IDs.
- Safe writes: append/overwrite, idempotent replace via
replace_filter, upsert. - Stream friendly: commit intervals, batches, IPC streams.
- Single config:
LoaderConfigsets defaults; override per-call if needed.
Install
pip install "iceberg-loader[all]"
Or with uv:
uv add "iceberg-loader[all]"
Quickstart
from iceberg_loader import LoaderConfig, load_data_to_iceberg
from iceberg_loader.utils.arrow import create_arrow_table_from_data
catalog = load_catalog("default")
table_id = ("default", "comparison_complex_json")
data = [
{"id": 1, "complex_field": {"a": 1, "b": "nested"}},
{"id": 2, "complex_field": {"a": 2, "b": "another", "c": [1, 2]}},
{"id": 3, "complex_field": [1, 2, 3]},
]
arrow_table = create_arrow_table_from_data(data)
config = LoaderConfig(write_mode="append", partition_col="signup_date", schema_evolution=True)
load_data_to_iceberg(arrow_table, table_id, catalog, config=config)
Which function to use?
| Function | Use when... | Input Format |
|---|---|---|
load_data_to_iceberg |
You have a single pa.Table in memory. |
pyarrow.Table |
load_batches_to_iceberg |
You have a generator/iterator of batches (memory efficient). | Iterator of pyarrow.RecordBatch |
load_ipc_stream_to_iceberg |
You are reading from an Arrow IPC stream file/socket. | File-like object or path |
Preparing Data
Use helpers to convert Python dictionaries to Arrow format (handling messy types automatically):
from iceberg_loader.utils.arrow import create_arrow_table_from_data, create_record_batches_from_dicts
# 1. Convert list of dicts -> pa.Table
arrow_table = create_arrow_table_from_data(data_list)
# 2. Convert iterator of dicts -> Iterator[pa.RecordBatch]
batches = create_record_batches_from_dicts(data_generator(), batch_size=10000)
Alternatively, use standard PyArrow conversion: pa.Table.from_pylist(data).
Contributing
We welcome contributions! See CONTRIBUTING.md for setup, coding style, and PR guidelines.
hatch run lint
hatch run test
Contributors
Thanks to all contributors who have helped make this project better!
Made with contrib.rocks.
License
iceberg-loader is distributed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iceberg_loader-0.0.7.tar.gz.
File metadata
- Download URL: iceberg_loader-0.0.7.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5d7148f30b23c449218246ac0d991c84fb65ab2663bd04a4a940b5a166b9433
|
|
| MD5 |
d2d061087eb6298a1c08743da2f16daf
|
|
| BLAKE2b-256 |
dc49fd1272dc039e830cfedb41f85622b604b3a547a60ff6d1950008156e164e
|
File details
Details for the file iceberg_loader-0.0.7-py3-none-any.whl.
File metadata
- Download URL: iceberg_loader-0.0.7-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e26c90a8757fe9ae5614003584350bd6a115741618f356c5e8ec3903a085d4
|
|
| MD5 |
3a9089652374e2053d2f19174c9cfe0e
|
|
| BLAKE2b-256 |
175171ba65402181122dd5ed549fed77584b3c352f14c21c850c60e8277d93f2
|