Skip to main content

Command line AI for customer data

Project description

chuck-banner

Chuck Data

Chuck is a text-based user interface (TUI) for managing Databricks resources including Unity Catalog, SQL warehouses, models, and volumes. Chuck Data provides an interactive shell environment for customer data engineering tasks with AI-powered assistance.

Check us out at chuckdata.ai.

Join our community on Discord.

Features

  • Interactive TUI for managing Databricks resources
  • AI-powered "agentic" data engineering assistant
  • Identity resolution powered by Amperity's Stitch
  • Use LLMs from your Databricks account via Databricks Model Serving
  • Browse Unity Catalog resources (catalogs, schemas, tables)
  • Profile database tables with automated PII detection (via LLMs)
  • Tag tables in Unity Catalog with semantic tags for PII to power compliance and data governance use cases
  • Command-based interface with both natural language commands and slash commands

Authentication

  • Authenticates with Databricks using personal access tokens
  • Authenticates with Amperity using API keys (/login and /logout commands)

Installation

pip install chuck-data

Usage

Chuck Data provides an interactive text-based user interface. Run the application using:

chuck 

Or run directly with Python:

python -m chuck_data 

Available Commands

Chuck Data supports a command-based interface with slash commands that can be used within the interactive TUI. Type /help within the application to see all available commands.

Some general commands to be aware of are:

  • /status - Show current connection status and application context
  • /login, /logout - Log in/out of Amperity, this is how Chuck interacts with Amperity to run Stitch
  • /list-models, /select-model <model_name> - Configure which LLM Chuck should use (Pick one designed for tools, we recommend databricks-claude-3-7-sonnet)
  • /list-warehouses, /select-warehouse <warehouse_name> - Many Chuck tools run SQL so make sure to select a warehouse

Many of Chuck's tools will use your selected Catalog and Schema so that you don't have to constantly specify them. Use these commands to manage your application context.

Catalog & Schema Management

  • /catalogs, /select-catalog <catalog_name> - Manage Catalog context
  • /schemas, /select-schema <schema_name> - Manage Schema context

Known Limitations & Best Practices

Known Limitations

  • Unstructured data - Stitch will ignore fields in formats that are not supported
  • GCP Support - Currently only AWS and Azure are formally supported, GCP will be added very soon
  • Stitching across Catalogs - Technically if you manually create Stitch manifests it can work but Chuck doesn't automatically handle this well

Best Practices

  • Use models designed for tools, we recommend databricks-claude-3-7-sonnet but have also tested extensively with databricks-llama-3.2-7b-instruct
  • Denormalized data models will work best with Stitch
  • Sample data to try out Stitch is available on the Databricks marketplace. (Use the bronze schema PII datasets)

Amperity Stitch

A key tool Chuck can use is Amperity's Stitch algorithm. This is a ML based identity resolution algorithm that has been refined with the world's biggest companies over the last decade.

  • Stitch outputs two tables in a schema called stitch_outputs. unified_coalesced is a table of standardized PII with Amperity IDs. unified_scores are the "edges" of the graph that have links and confidence scores for each match.
  • Stitch will create a new notebook in your workspace each time it runs that you can use to understand the results, be sure to check it out!
  • For a detailed breakdown of how Stitch works, see this great article breaking it down step by step

Support

Chuck is a research preview application that is actively being improved based on your usage and feedback. Always be sure to update to the latest version of Chuck to get the best experience!

Support Options

  1. GitHub Issues
    Report bugs or request features on our GitHub repository:
    https://github.com/amperity/chuck-data/issues

  2. Discord Community
    Join our community to chat with other users and developers:
    https://discord.gg/f3UZwyuQqe
    Or run /discord in the application

  3. Email Support
    Contact our dedicated support team:
    chuck-support@amperity.com

  4. In-app Bug Reports
    Let Chuck submit a bug report automatically with the /bug command

Development

Requirements

  • Python 3.10 or higher
  • uv - Python package installer and resolver (technically this is not required but it sure makes life easier)

Project Structure

chuck_data/             # Main package
├── __init__.py
├── __main__.py         # CLI entry point
├── commands/           # Command implementations
├── ui/                 # User interface components
├── agent/              # AI agent functionality
├── clients/            # External service clients
├── databricks/         # Databricks utilities
└── ...                 # Other modules

Installation

Install the project with development dependencies:

uv pip install -e .[dev]

Testing

Run the test suite:

uv run -m pytest

Run linters and static analysis:

uv run ruff .
uv run black --check --diff chuck_data tests
uv run ruff check
uv run pyright

For test coverage:

uv run -m pytest --cov=chuck_data

CI/CD

This project uses GitHub Actions for continuous integration:

  • Automated testing on Python 3.10
  • Code linting with flake8
  • Format checking with Black

The CI workflow runs on every push to main and on pull requests. You can also trigger it manually from the Actions tab in GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuck_data-0.1.3.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chuck_data-0.1.3-py3-none-any.whl (261.2 kB view details)

Uploaded Python 3

File details

Details for the file chuck_data-0.1.3.tar.gz.

File metadata

  • Download URL: chuck_data-0.1.3.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for chuck_data-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1f92be39ca04484505f4d260e6206cb73be1e9c2a6968e8e249e7b7b8e98103d
MD5 24b5d698a11d52da10aea2d18e4a6f00
BLAKE2b-256 cd3c4e2c63dc4b10cd86a350600efcc3df916119f5ead96e1c521bbb90cf2a71

See more details on using hashes here.

File details

Details for the file chuck_data-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: chuck_data-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 261.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for chuck_data-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 458ce33869ac6e114a647aac1332998a7d030b487db40e5de18113d40792dfa4
MD5 676d037435ed66d68b1a5f3d3772b6d4
BLAKE2b-256 feaa1a24c94a03544950dfe525f7118af9be417fdd42d323a752dd39b3a52143

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page