Skip to main content

A DataFrame-like library for working with Apache Iceberg tables

Project description

IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

Features

  • DataFrame API: Familiar interface for working with tables
  • Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
  • Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
  • CRUD Operations: Create, Read, Update, Delete tables and data
  • Maintenance: Expire snapshots, remove orphan files, compact data files
  • Export: Export data to Parquet, CSV, and JSON

Documentation

Getting Started

Data Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

pip install iceframe

For cloud storage support:

pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage

Quick Start

  1. Create a .env file with your catalog credentials (see .env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
  1. Use IceFrame in your code:
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create a table
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp"
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum

df = (ice.query("my_table")
      .select("name", sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)

Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

Feature PyIceberg (Native) IceFrame (Enhanced)
Table CRUD Low-level API Simplified create_table, drop_table
Data Writing Arrow/Pandas integration Polars integration, Auto-schema inference
Branching Basic support (WIP) create_branch, fast_forward, WAP Pattern
Compaction rewrite_data_files (limited) bin_pack, sort strategies (Polars-based)
Views Catalog-dependent Unified ViewManager abstraction
Maintenance expire_snapshots GarbageCollector, Native remove_orphan_files
SQL Support None Fluent Query Builder (select, filter, join)
Ingestion add_files add_files wrapper + Incremental Ingestion recipes
Rollback manage_snapshots rollback_to_snapshot, rollback_to_timestamp
Async None AsyncIceFrame for non-blocking I/O

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iceframe-0.10.0.tar.gz (96.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iceframe-0.10.0-py3-none-any.whl (89.3 kB view details)

Uploaded Python 3

File details

Details for the file iceframe-0.10.0.tar.gz.

File metadata

  • Download URL: iceframe-0.10.0.tar.gz
  • Upload date:
  • Size: 96.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for iceframe-0.10.0.tar.gz
Algorithm Hash digest
SHA256 77058efb38fd78c4be8c7def7a94372ce87d2f437ae695f9fbfaffb3e7a7e2f4
MD5 bfe82f90793167f2ca6f1498b160c73c
BLAKE2b-256 b0addf9a097fb210f8a6cf68e56291f1f70458d21127f799999db981cc65154a

See more details on using hashes here.

File details

Details for the file iceframe-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: iceframe-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for iceframe-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b7f038a8f406e9a464ed2dd1a7269a538b6e0b2f1ebc6720996e371a6b37dc8
MD5 032afe21f2577d1a51580aa3f3062643
BLAKE2b-256 54ed682ad43429ebb933c4ed1d26a1b22c8e1376c10b9ea7c5078672dcde6ed6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page