A DataFrame-like library for working with Apache Iceberg tables
Project description
IceFrame (Alpha)
A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.
IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.
Features
- DataFrame API: Familiar interface for working with tables
- Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
- Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
- CRUD Operations: Create, Read, Update, Delete tables and data
- Maintenance: Expire snapshots, remove orphan files, compact data files
- Export: Export data to Parquet, CSV, and JSON
Documentation
Getting Started
- Creating Tables
- Reading Tables
- Updating Tables
- Deleting Tables
- CLI Usage
- Dependencies
- Environment Variables
Data Ingestion
- Native File Ingestion (CSV, JSON, Parquet, ORC, Avro)
- Optional File Ingestion (Excel, Delta, Google Sheets)
- Advanced File Ingestion (SQL, XML, SAS/SPSS)
- API Ingestion
- HuggingFace Ingestion
- HTML Ingestion
- Clipboard Ingestion
- Folder Ingestion
- Bulk Ingestion
- Incremental Ingestion
Querying & Processing
- Query Builder API
- SQL Support (DataFusion)
- Lazy Reading
- Distributed Processing (Ray)
- Async Operations
- Notebook Integration
- Scalable Updates
Table Management
- Namespace Management
- Schema Evolution
- Partition Management
- Branching & Tagging
- Views (if exists, or remove)
- Catalog Operations
Maintenance & Quality
- Table Maintenance
- Native Maintenance
- Safe Compaction
- Streaming Auto-Compaction
- Data Quality
- Enhanced Data Quality
- Rollback & History
Advanced Features
- Visualization
- Incremental Processing
- Table Statistics
- Scalability Overview
- AI Agent
- MCP Server
- Pydantic Integration
Recipes
Installation
pip install iceframe
For cloud storage support:
pip install "iceframe[aws]" # AWS S3
pip install "iceframe[gcs]" # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage
Quick Start
- Create a
.envfile with your catalog credentials (see.env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
- Use IceFrame in your code:
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl
# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)
# Create a table
schema = {
"id": "long",
"name": "string",
"created_at": "timestamp"
}
ice.create_table("my_table", schema)
# Append data
data = pl.DataFrame({
"id": [1, 2],
"name": ["Alice", "Bob"],
"created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)
# Read data
df = ice.read_table("my_table")
print(df)
# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum
df = (ice.query("my_table")
.select("name", sum(col("id")).alias("total_id"))
.group_by("name")
.execute())
print(df)
Feature Comparison: IceFrame vs PyIceberg
IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.
| Feature | PyIceberg (Native) | IceFrame (Enhanced) |
|---|---|---|
| Table CRUD | Low-level API | Simplified create_table, drop_table |
| Data Writing | Arrow/Pandas integration | Polars integration, Auto-schema inference |
| Branching | Basic support (WIP) | create_branch, fast_forward, WAP Pattern |
| Compaction | rewrite_data_files (limited) |
bin_pack, sort strategies (Polars-based) |
| Views | Catalog-dependent | Unified ViewManager abstraction |
| Maintenance | expire_snapshots |
GarbageCollector, Native remove_orphan_files |
| SQL Support | None | Fluent Query Builder (select, filter, join) |
| Ingestion | add_files |
add_files wrapper + Incremental Ingestion recipes |
| Rollback | manage_snapshots |
rollback_to_snapshot, rollback_to_timestamp |
| Async | None | AsyncIceFrame for non-blocking I/O |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iceframe-0.10.0.tar.gz.
File metadata
- Download URL: iceframe-0.10.0.tar.gz
- Upload date:
- Size: 96.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77058efb38fd78c4be8c7def7a94372ce87d2f437ae695f9fbfaffb3e7a7e2f4
|
|
| MD5 |
bfe82f90793167f2ca6f1498b160c73c
|
|
| BLAKE2b-256 |
b0addf9a097fb210f8a6cf68e56291f1f70458d21127f799999db981cc65154a
|
File details
Details for the file iceframe-0.10.0-py3-none-any.whl.
File metadata
- Download URL: iceframe-0.10.0-py3-none-any.whl
- Upload date:
- Size: 89.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b7f038a8f406e9a464ed2dd1a7269a538b6e0b2f1ebc6720996e371a6b37dc8
|
|
| MD5 |
032afe21f2577d1a51580aa3f3062643
|
|
| BLAKE2b-256 |
54ed682ad43429ebb933c4ed1d26a1b22c8e1376c10b9ea7c5078672dcde6ed6
|