Skip to main content

A DataFrame-like library for working with Apache Iceberg tables

Project description

IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

Upgrading from 0.11? 0.12.0 fixes several long-standing data-correctness bugs, including a silent double-write in every create_table_from_* helper, an orphan-file collector that could delete files referenced by older snapshots, and a ~/NOT predicate that could silently return zero rows. See CHANGELOG.md for the full list and behaviour notes.

Features

  • DataFrame API: Familiar interface for working with tables
  • Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
  • Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
  • CRUD Operations: Create, Read, Update, Delete tables and data
  • Maintenance: Expire snapshots, remove orphan files, compact data files
  • Export: Export data to Parquet, CSV, and JSON

Documentation

Getting Started

Data Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

pip install iceframe

For cloud storage support:

pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage

Quick Start

  1. Create a .env file with your catalog credentials (see .env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
  1. Use IceFrame in your code:
from iceframe import IceFrame, col, lit, load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create an EMPTY table (note: as of 0.12, create_table never writes data
# even if you pass a DataFrame as the schema — use append_to_table afterwards)
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp",
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)],
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API — col, lit, IceFrame, QueryBuilder all importable from
# the package root as of 0.12.
from iceframe.functions import sum as ice_sum

df = (ice.query("my_table")
      .select("name", ice_sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)

Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

Feature PyIceberg (Native) IceFrame (Enhanced)
Table CRUD Low-level API Simplified create_table, drop_table
Data Writing Arrow/Pandas integration Polars integration, Auto-schema inference
Branching Basic support (WIP) create_branch, fast_forward, WAP Pattern
Compaction rewrite_data_files (limited) bin_pack, sort strategies (Polars-based)
Views Catalog-dependent Unified ViewManager abstraction
Maintenance expire_snapshots GarbageCollector, Native remove_orphan_files
SQL Support None Fluent Query Builder (select, filter, join)
Ingestion add_files add_files wrapper + Incremental Ingestion recipes
Rollback manage_snapshots rollback_to_snapshot, rollback_to_timestamp
Async None AsyncIceFrame for non-blocking I/O

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iceframe-0.12.0.tar.gz (105.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iceframe-0.12.0-py3-none-any.whl (92.0 kB view details)

Uploaded Python 3

File details

Details for the file iceframe-0.12.0.tar.gz.

File metadata

  • Download URL: iceframe-0.12.0.tar.gz
  • Upload date:
  • Size: 105.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for iceframe-0.12.0.tar.gz
Algorithm Hash digest
SHA256 0be0fba713b4f2ea7f88e0936bef55cd11c8879760e86f3dc96a6e8b31fdb45d
MD5 1a437b7af49c327eb4c15865ed84adb5
BLAKE2b-256 a6488357dabd16181fafc99846696c405ba476cc639d4f816c07db5f5aad618e

See more details on using hashes here.

File details

Details for the file iceframe-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: iceframe-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 92.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for iceframe-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e19b79d8e6fd56cd513d38d3808067996c03690c25b2be35f5197c99759ab6c
MD5 f886009c2d560d9523968eb95137a096
BLAKE2b-256 4f250238c0f85693cc10630c04c3aa522d0d6c8a9f45f50322d4815119a83c89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page