Skip to main content

TableSleuth - a Textual TUI for Open Table Format forensics (OTF) with data profiling.

Project description

TableSleuth

PyPI version Python versions License CI Publish to PyPI

A powerful terminal-based tool for deep inspection of Parquet files and Apache Iceberg tables. Analyze file structure, metadata, row groups, column statistics, and table evolution with an intuitive TUI interface.

Key Features

Parquet Analysis

  • Deep File Inspection - Comprehensive metadata extraction using PyArrow
  • Row Group Analysis - Examine distribution, compression, and statistics
  • Column Profiling - Profile data using GizmoSQL (DuckDB over Arrow Flight SQL)
  • Data Sampling - Preview and filter data with column selection
  • Directory Scanning - Recursively discover and inspect Parquet files

Iceberg Table Analysis

  • Snapshot Navigation - Browse table history and metadata evolution
  • Performance Testing - Compare query performance across snapshots
  • Delete File Inspection - Analyze MOR (Merge-on-Read) delete files
  • Schema Evolution - Track schema changes over time
  • Catalog Support - Local SQLite, AWS Glue, and AWS S3 Tables

Interface

  • Interactive TUI - Keyboard-driven navigation with rich visualizations
  • Multi-Source Support - Local files, S3, and Iceberg catalogs
  • Performance Optimized - Async operations, caching, and lazy loading

Screenshots

Parquet File Inspection

File Structure & Schema Parquet Structure

Row Group Analysis Row Groups

Data Sample View Data Sample

Column Profiling Profile

Iceberg Table Analysis

Snapshot Overview Iceberg Overview

Performance Testing Performance

Delete Files (MOR) Deletes

Snapshot Comparison Compare

Quick Start

# Install with uv (recommended)
uv sync

# Inspect a Parquet file
tablesleuth inspect data/file.parquet

# Inspect a directory (recursive)
tablesleuth inspect data/warehouse/

# Inspect an Iceberg table
tablesleuth inspect db.table --catalog local

# Inspect AWS S3 Tables (using ARN)
tablesleuth inspect "arn:aws:s3tables:us-east-2:123456789012:bucket/my-bucket/table/db.table"

📚 Documentation:

Installation

Requirements: Python 3.13+ and uv

# Install from PyPI
pip install tablesleuth

# Or install from source
git clone https://github.com/jamesbconner/TableSleuth
cd TableSleuth
uv sync

# Verify installation
tablesleuth --version

# Initialize configuration files
tablesleuth init

See TABLESLEUTH_SETUP.md for detailed setup including AWS, GizmoSQL, and catalog configuration.

Quick Start

# 1. Initialize configuration (first time only)
tablesleuth init

# 2. Edit configuration files
#    - tablesleuth.toml (main config)
#    - .pyiceberg.yaml (catalog config)

# 3. Verify configuration
tablesleuth config-check

# 4. Start inspecting files
tablesleuth inspect data/file.parquet

Configuration

Quick Setup

# Initialize configuration files with interactive prompts
tablesleuth init

# Check configuration and test connections
tablesleuth config-check
tablesleuth config-check -v  # Verbose output

Configuration Files

tablesleuth.toml - Main configuration:

[catalog]
default = "local"  # Default Iceberg catalog

[gizmosql]
uri = "grpc+tls://localhost:31337"
username = "gizmosql_username"
password = "gizmosql_password"
tls_skip_verify = true

Configuration Priority:

  1. Environment variables (TABLESLEUTH_*)
  2. Local config files (./tablesleuth.toml, ./.pyiceberg.yaml)
  3. Home config files (~/tablesleuth.toml, ~/.pyiceberg.yaml)
  4. Built-in defaults

Iceberg Catalogs

Configure PyIceberg in .pyiceberg.yaml:

catalog:
  local:
    type: sql
    uri: sqlite:////path/to/catalog.db
    warehouse: file:///path/to/warehouse

For detailed configuration:

Usage

CLI Commands

# Configuration management
tablesleuth init                    # Initialize config files
tablesleuth config-check            # Validate configuration
tablesleuth config-check -v         # Detailed validation

# Inspect Parquet files
tablesleuth inspect file.parquet
tablesleuth inspect directory/
tablesleuth inspect s3://bucket/path/file.parquet

# Inspect Iceberg tables
tablesleuth inspect db.table --catalog local
tablesleuth inspect "arn:aws:s3tables:region:account:bucket/name/table/db.table"

# Launch Iceberg viewer
tablesleuth iceberg --catalog local --table db.table

TUI Navigation

Key Action
q Quit
r Refresh
f Filter columns
Tab Switch tabs
↑/↓ Navigate
Enter Select

See User Guide for complete keyboard shortcuts and features.

Optional: GizmoSQL Profiling

Enable column profiling and performance testing with GizmoSQL (DuckDB over Arrow Flight SQL).

Quick Setup:

# Install GizmoSQL (macOS ARM64 example)
curl -L https://github.com/gizmodata/gizmosql/releases/download/v1.12.10/gizmosql_cli_macos_arm64.zip \
  | sudo unzip -o -d /usr/local/bin -

# Start server
gizmosql_server -P password -Q -T ~/.certs/cert0.pem ~/.certs/cert0.key

See GizmoSQL Deployment Guide for complete setup and EC2 deployment.

Architecture

TableSleuth uses a layered architecture:

  • TUI Layer - Textual-based terminal interface with rich visualizations
  • Service Layer - Business logic for file inspection, profiling, and discovery
  • Integration Layer - PyArrow for Parquet, PyIceberg for tables, GizmoSQL for profiling

See Architecture Guide for detailed technical documentation.

Development

# Install with dev dependencies
uv sync --all-extras

# Run tests
pytest

# Run quality checks
uv run pre-commit run --all-files

# Type checking
mypy src/

See Development Setup for complete development environment setup.

Documentation

Getting Started

Advanced Topics

Development

What's New

v0.4.0 (Current)

  • 🎉 Now available on PyPI! Install with pip install tablesleuth
  • 🔄 Package renamed to tablesleuth for consistency
  • 🤖 Automated CI/CD with GitHub Actions
  • 📦 Enhanced PyPI metadata and publishing workflow
  • ✅ All existing features from v0.3.0

v0.3.0

  • ✅ Parquet file inspection (local and S3)
  • ✅ Iceberg snapshot navigation and analysis
  • ✅ Delete file inspection and MOR forensics
  • ✅ Snapshot comparison and performance testing
  • ✅ Column profiling with GizmoSQL
  • ✅ AWS Glue and S3 Tables catalog support
  • ✅ Interactive TUI with rich visualizations

Roadmap

  • Delta Lake and Hudi support
  • Schema evolution visualization
  • Export capabilities (JSON, CSV reports)
  • REST catalog support
  • Automated optimization recommendations

Contributing

Contributions welcome! See Developer Guide and Development Setup.

License

MIT License - See LICENSE for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablesleuth-0.4.2.post1.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablesleuth-0.4.2.post1-py3-none-any.whl (97.6 kB view details)

Uploaded Python 3

File details

Details for the file tablesleuth-0.4.2.post1.tar.gz.

File metadata

  • Download URL: tablesleuth-0.4.2.post1.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablesleuth-0.4.2.post1.tar.gz
Algorithm Hash digest
SHA256 9a84704563480bfbf73204473afab6ba6182c04f469aa01adc472c45fde5f99e
MD5 6dd021cb744167b709fedbeb1f6cd92c
BLAKE2b-256 c37ced48250a9e1c680b89a02fb7233e127fe08a823d856bd66dfd85aa17307d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablesleuth-0.4.2.post1.tar.gz:

Publisher: publish.yml on jamesbconner/TableSleuth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablesleuth-0.4.2.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for tablesleuth-0.4.2.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 b2c8fdaaae0ff127e950c58c8073999c0af41c697d6e67ab3b1f2b36bceb0f2e
MD5 a9676de9f60238619c22623ccd251134
BLAKE2b-256 6114553800fce3f7b051d73539164623cec87318e473d37c93b11724ba1663af

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablesleuth-0.4.2.post1-py3-none-any.whl:

Publisher: publish.yml on jamesbconner/TableSleuth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page