Skip to main content

A DataFrame-like library for working with Apache Iceberg tables

Project description

IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

Features

  • DataFrame API: Familiar interface for working with tables
  • Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
  • Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
  • CRUD Operations: Create, Read, Update, Delete tables and data
  • Maintenance: Expire snapshots, remove orphan files, compact data files
  • Export: Export data to Parquet, CSV, and JSON

Documentation

Getting Started

Data Ingestion

Querying & Processing

Table Management

Maintenance & Quality

Advanced Features

Recipes

Installation

pip install iceframe

For cloud storage support:

pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage

Quick Start

  1. Create a .env file with your catalog credentials (see .env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
  1. Use IceFrame in your code:
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create a table
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp"
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum

df = (ice.query("my_table")
      .select("name", sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)

Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

Feature PyIceberg (Native) IceFrame (Enhanced)
Table CRUD Low-level API Simplified create_table, drop_table
Data Writing Arrow/Pandas integration Polars integration, Auto-schema inference
Branching Basic support (WIP) create_branch, fast_forward, WAP Pattern
Compaction rewrite_data_files (limited) bin_pack, sort strategies (Polars-based)
Views Catalog-dependent Unified ViewManager abstraction
Maintenance expire_snapshots GarbageCollector, Native remove_orphan_files
SQL Support None Fluent Query Builder (select, filter, join)
Ingestion add_files add_files wrapper + Incremental Ingestion recipes
Rollback manage_snapshots rollback_to_snapshot, rollback_to_timestamp
Async None AsyncIceFrame for non-blocking I/O

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iceframe-0.11.1.tar.gz (96.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iceframe-0.11.1-py3-none-any.whl (89.8 kB view details)

Uploaded Python 3

File details

Details for the file iceframe-0.11.1.tar.gz.

File metadata

  • Download URL: iceframe-0.11.1.tar.gz
  • Upload date:
  • Size: 96.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for iceframe-0.11.1.tar.gz
Algorithm Hash digest
SHA256 8dff1fb9493c3ee646c6cc49782d51f41ef1d15c23f3251e1d428d6352bdaa58
MD5 218ee270d42a071d3f904b9768446781
BLAKE2b-256 55e5139c12ba1c6d114334380e8c8c618514d4fe759b1f2491b2da134b9150a2

See more details on using hashes here.

File details

Details for the file iceframe-0.11.1-py3-none-any.whl.

File metadata

  • Download URL: iceframe-0.11.1-py3-none-any.whl
  • Upload date:
  • Size: 89.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for iceframe-0.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dc859dfde435710335f3e22f613ebc7aa5ad1699be08bb5b3a5c97eb2184b1fb
MD5 54a873bd12ed5e9c1d99c85713d7373f
BLAKE2b-256 7854105b28b89884c99fc4d809621faff499e2bc6be9461212377219278fc782

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page