Skip to main content

CLI tool for exploring Apache Iceberg table metadata

Project description

iceberg-meta

CLI and TUI for exploring Apache Iceberg table metadata. Lightweight, terminal-native, and scriptable -- inspect snapshots, schemas, manifests, data files, partition health, and column-level statistics without spinning up a Spark shell or writing a notebook.

Why iceberg-meta?

Iceberg tables store rich metadata -- schema evolution history, snapshot lineage, manifest-level statistics, column bounds, and more. But accessing any of it usually means writing PySpark code or digging through Avro files by hand.

iceberg-meta gives you instant access to all of it from the terminal:

  • One command to see everything: iceberg-meta health sales.orders shows file sizes, small-file warnings, partition skew, column null rates, column storage distribution, and value bounds -- all at once.
  • Interactive exploration: the TUI lets you browse tables, schemas, and snapshots visually without memorizing flags.
  • Scriptable output: every command supports --output json and --output csv for CI/CD pipelines and alerting.
  • Zero infrastructure: works with any catalog pyiceberg supports (SQL, REST, Glue, Hive, Nessie, Hadoop).

Install

pip install iceberg-meta

# With the interactive TUI (optional)
pip install iceberg-meta[tui]

Quick Start

# 1. Configure — picks your catalog type, writes ~/.iceberg-meta.yaml
#    with ${VAR} placeholders (secrets stay in the environment)
iceberg-meta init

# 2. Verify — checks config, env vars, and catalog connectivity
iceberg-meta doctor

# 3. Explore
iceberg-meta list-tables
iceberg-meta summary sales.orders
iceberg-meta health sales.orders
iceberg-meta tree sales.orders

Or launch the interactive TUI to browse everything visually:

iceberg-meta tui

See the quickstart/ folder for a guided walkthrough with Docker.

Commands

Command Description
init Interactive config setup -- catalog presets, ${VAR} placeholders, connection test
doctor Validate config file, environment variables, and catalog connectivity
list-tables Discover namespaces and tables
summary <table> Single-screen dashboard: row counts, file counts, recent operations
health <table> Comprehensive health report: file sizes, small-file detection, partition skew, column null rates, column sizes, column bounds
table-info <table> Format version, UUID, location, schema, partition spec, properties
snapshots <table> All snapshots with timestamps, operations, summary (--watch N)
schema <table> Current schema as a tree (--history for all versions with diffs)
manifests <table> Manifest files for current or specified snapshot
files <table> Data files with sizes, row counts, format
partitions <table> Partition statistics
snapshot-detail <table> <id> Deep dive into one snapshot: manifests + files
diff <table> <snap1> <snap2> What changed between two snapshots
tree <table> Full metadata hierarchy as a tree (--all-snapshots)
tui Interactive terminal UI -- browse tables and metadata visually

Use Cases

Pre-merge validation in CI/CD

Before merging a pipeline PR, confirm the write actually produced the expected outcome:

iceberg-meta summary staging.orders          # row count, file count, latest snapshot
iceberg-meta diff staging.orders $OLD $NEW   # what changed between two snapshots
iceberg-meta files staging.orders -o csv     # pipe file-level stats into a check script

Debugging failing writes

A Spark job "succeeded" but downstream dashboards are empty. Quickly narrow the problem:

iceberg-meta snapshots staging.orders        # did the snapshot actually land?
iceberg-meta schema staging.orders --history # did a schema evolution break compatibility?
iceberg-meta files staging.orders            # are the new data files present and non-empty?

Monitoring table health

Spot small-file problems, partition skew, and compaction needs before they impact query performance:

iceberg-meta health warehouse.events         # full health report in one command

The health report includes:

  • File health: min/avg/median/max sizes, small-file warnings (< 32 MB)
  • Delete files: data vs delete manifest counts, compaction recommendations
  • Partition skew: per-partition file counts and row counts with skew detection
  • Column null rates: percentage of nulls per column, color-coded by severity
  • Column sizes: storage distribution with bar charts showing which columns are largest
  • Column bounds: min/max values per column from file-level statistics

Live monitoring

Watch for new snapshots as a pipeline runs:

iceberg-meta snapshots warehouse.events --watch 5  # refresh every 5 seconds

Onboarding and knowledge transfer

A new team member needs to understand the data platform. The TUI lets them browse interactively without memorizing commands:

iceberg-meta tui

Incident response

Production data looks wrong. Compare snapshots to find when the issue was introduced:

iceberg-meta snapshots prod.customers        # find the suspicious snapshot IDs
iceberg-meta diff prod.customers 111 222     # compare record counts and file changes
iceberg-meta tree prod.customers             # drill into manifests and data files

Scripting and automation

Pipe machine-readable output into other tools:

# Alert if file count exceeds threshold
FILE_COUNT=$(iceberg-meta -o json summary db.events | jq '.file_count')
[ "$FILE_COUNT" -gt 1000 ] && echo "Small file problem detected"

# Export snapshot history to CSV for a report
iceberg-meta -o csv snapshots db.events > snapshots.csv

# Health data as JSON for a monitoring dashboard
iceberg-meta -o json health db.events | jq '.[] | select(.Section == "Column Nulls")'

TUI

The interactive TUI (iceberg-meta tui) covers nearly all CLI functionality in a single screen. Press ? inside the TUI for a full keybinding reference.

Key Tab / Action
1 Summary -- table overview, recent operations, file health indicators
2 Snapshots -- snapshot history with operations and summary
3 Schema -- schema evolution with diffs between versions
4 Files -- data files with size distribution stats (min/avg/median/max)
5 Manifests -- manifest files for current snapshot
6 Health -- file sizes, partition skew, column nulls, column sizes, column bounds
7 Tree -- full metadata hierarchy (snapshot > manifest list > manifests > files)
d Diff -- compare two snapshots (modal)
s Detail -- snapshot deep-dive (modal)
r Refresh all panels
? Help screen with all keybindings and CLI equivalents
q Quit

The sidebar always shows the namespace/table tree (equivalent to list-tables).

CLI-only features: init (interactive config setup), snapshots --watch N (live-watch mode), table-info (UUID, properties, partition spec), partitions (basic table view), --output json|csv (machine-readable output).

Configuration

Config file with ${VAR} placeholders

Create ~/.iceberg-meta.yaml. Values wrapped in ${VAR} are resolved from the environment at runtime -- never hard-code credentials:

default_catalog: production

catalogs:
  production:
    type: glue
    warehouse: ${ICEBERG_WAREHOUSE}
    s3.region: ${AWS_REGION}

  staging:
    type: sql
    uri: ${ICEBERG_CATALOG_URI}
    warehouse: ${ICEBERG_WAREHOUSE}
    s3.endpoint: ${S3_ENDPOINT}
    s3.access-key-id: ${AWS_ACCESS_KEY_ID}
    s3.secret-access-key: ${AWS_SECRET_ACCESS_KEY}
    s3.region: ${AWS_REGION}

See examples/iceberg-meta.yaml for configs covering Glue, REST, Nessie, Hive, and Hadoop catalogs.

Environment variable overrides

These override any config file value without needing ${VAR} syntax:

Variable Maps to
ICEBERG_META_CATALOG_URI uri
ICEBERG_META_WAREHOUSE warehouse
ICEBERG_META_S3_ENDPOINT s3.endpoint
ICEBERG_META_S3_ACCESS_KEY s3.access-key-id
ICEBERG_META_S3_SECRET_KEY s3.secret-access-key
ICEBERG_META_S3_REGION s3.region

Interactive setup

iceberg-meta init

Environment variables

iceberg-meta uses python-dotenv to auto-load a .env file from your working directory:

  • No source or export needed -- just place a standard .env file in your project
  • Use any variable names you already have -- reference them in your config with ${MY_VAR}
  • Point to a specific file if your .env is elsewhere: iceberg-meta --env-file path/to/.env
  • Already-exported shell variables take precedence over .env values (standard dotenv behavior)

Data engineers who already have AWS credentials or catalog URIs in their environment don't need a .env file at all -- the ${VAR} placeholders in the config resolve against whatever is already set.

Global Options

Option Description
--catalog, -c Catalog name (as defined in config)
--uri Catalog URI override
--warehouse, -w Warehouse path override
--output, -o Output format: table (default), json, csv
--env-file, -e Path to .env file (auto-loads .env in cwd by default)

Architecture

┌─────────────────────────────────────────────────────────┐
│                    iceberg-meta CLI                      │
│                      (Typer app)                         │
├──────────────┬──────────────────────┬───────────────────┤
│  catalog.py  │    formatters.py     │     utils.py      │
│              │                      │                   │
│  Config      │  Rich Tables/Trees   │  format_bytes()   │
│  resolution  │  for each command    │  format_time()    │
│  + ${VAR}    │                      │  truncate_path()  │
├──────────────┴──────────────────────┴───────────────────┤
│                    pyiceberg                             │
│          (catalog, table, inspect APIs)                  │
├─────────────────────────────────────────────────────────┤
│         Any Iceberg Catalog (SQL, REST, Glue, Hive)     │
├─────────────────────────────────────────────────────────┤
│              S3 / MinIO / HDFS / Local Storage           │
│          (Parquet data + Avro metadata)                  │
└─────────────────────────────────────────────────────────┘

Project Layout

iceberg-meta/
│
├── src/iceberg_meta/      PyPI package source (what gets published)
│   ├── catalog.py         Config resolution + ${VAR} expansion
│   ├── cli.py             Typer commands
│   ├── formatters.py      Rich table / tree renderers + health analysis
│   ├── output.py          JSON, CSV, Rich table output
│   ├── utils.py           Byte / timestamp formatting helpers
│   └── tui/               Interactive terminal UI (optional)
│
├── dev/                   Development & testing
│   ├── .env.example       Environment template (credentials, endpoints)
│   ├── docker-compose.yml MinIO + seed containers
│   ├── docker/            MinIO & seed Dockerfiles
│   ├── tests/             pytest suite (integration + unit)
│   ├── scripts/           Host-side seed script
│   └── DEMO.md            Step-by-step dev walkthrough
│
├── quickstart/            End-user sandbox ("pip install and go")
│   ├── .env.example       Credentials template
│   ├── docker-compose.yml MinIO only (lightweight)
│   ├── iceberg-meta.yaml  Config with ${VAR} placeholders
│   ├── seed.py            Sample data creator
│   └── README.md          Getting-started guide
│
├── examples/              Sample configs for real catalogs
│   └── iceberg-meta.yaml  Glue, REST, Nessie, Hive, Hadoop templates
│
├── pyproject.toml         Package definition
├── Makefile               Dev commands (make test, make lint, ...)
└── LICENSE                MIT license

Development

Requires uv for dependency management.

# First-time setup
make install
make setup          # copies dev/.env.example → .env

# Start infrastructure and seed data
make infra-up
make seed

# Development workflow
make lint         # ruff check
make format       # ruff format
make typecheck    # mypy
make test         # pytest
make test-cov     # pytest with coverage
make all          # lint + format + typecheck + test

# Build & publish
make build        # build sdist + wheel
make clean        # remove build artifacts
make infra-down   # stop & remove containers

See dev/README.md for the full contributor guide and dev/DEMO.md for a step-by-step walkthrough.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iceberg_meta-0.1.1.tar.gz (46.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iceberg_meta-0.1.1-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file iceberg_meta-0.1.1.tar.gz.

File metadata

  • Download URL: iceberg_meta-0.1.1.tar.gz
  • Upload date:
  • Size: 46.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for iceberg_meta-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b4157d72580858f809874dd1e83885cb8bf64153327c8a026297452718f19dbb
MD5 097ce1daeeeb9f878e47e631a1be232c
BLAKE2b-256 c524f185bafe546c3c6bd3df213e40060b088c467cc048f909c36a2def7bc8eb

See more details on using hashes here.

File details

Details for the file iceberg_meta-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: iceberg_meta-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for iceberg_meta-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3c035f5e8414c16ef9abdf39f2bbbffe5e2981a2a4a1cad61082255a6f60705e
MD5 4bbc5094b8265c549f91cbaf6f3624c4
BLAKE2b-256 790ef981ba58e4e60c1641378b56400a8aa3f0718aa3eed52121c6b515049947

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page