A lightweight, open-source AI assistant for data platform operations
Project description
databot
A lightweight, open-source AI assistant for data platform operations.
~4,000 lines of core code -- built for data engineers who need an intelligent assistant for monitoring pipelines, diagnosing data quality issues, and querying infrastructure.
Features
- SQL Queries: Execute read-only queries against Clickzetta, TiDB, Trino, or any SQL database
- Airflow Integration: Check DAG status, view task logs, trigger runs via REST API
- Data Quality: Run DQ checks -- row counts, null rates, freshness, source-target comparison
- Data Lineage: Query upstream/downstream dependencies using NetworkX graphs
- Scheduled Tasks: Cron-based proactive monitoring with Google Chat alerts
- Google Chat: Webhook (send-only) and App (bidirectional) modes
- Shell & Filesystem: Execute commands, read/write files with workspace sandboxing
- Multi-Provider LLM: Anthropic, OpenAI, DeepSeek, Gemini, local vLLM via LiteLLM
- Persistent Memory: SQLite-backed sessions and key-value memory (zero external deps)
Quick Start
Install
# From PyPI
pip install databot-ai
# From source (recommended for development)
git clone https://github.com/asb108/databot.git
cd databot
pip install -e ".[all]"
Initialize
databot onboard
Configure
Edit ~/.databot/config.yaml:
providers:
default: anthropic
anthropic:
api_key: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-5-20250929
channels:
gchat:
enabled: true
mode: webhook
webhook_url: ${GCHAT_WEBHOOK_URL}
tools:
sql:
connections:
clickzetta:
driver: clickzetta
host: ${CZ_HOST}
schema_name: data_warehouse
virtual_cluster: ${CZ_VC}
read_only: true
max_rows: 1000
airflow:
base_url: ${AIRFLOW_URL}
username: ${AIRFLOW_USER}
password: ${AIRFLOW_PASSWORD}
security:
restrict_to_workspace: true
allowed_commands: ["kubectl", "airflow", "trino-cli"]
Chat
# Single message
databot agent -m "How many rows in pricing.rate_cards?"
# Interactive mode
databot agent
# Start gateway (always-on with cron + Google Chat)
databot gateway
Architecture
databot/
cli/ # Typer CLI commands
config/ # Pydantic config schema + YAML loader
core/ # Agent loop, message bus, context builder
providers/ # LLM provider abstraction (LiteLLM)
tools/ # Pluggable tools (SQL, Airflow, DQ, lineage, shell, fs, web, cron)
channels/ # Messaging channels (Google Chat, CLI)
session/ # SQLite-backed conversation history
memory/ # Persistent key-value memory
cron/ # Scheduled task execution
Tools
| Tool | Description |
|---|---|
sql |
Execute SQL queries against configured databases |
airflow |
Check DAG status, view logs, trigger runs |
data_quality |
Row counts, null checks, freshness, source-target comparison |
lineage |
Upstream/downstream dependencies, path finding |
shell |
Execute shell commands (sandboxed) |
read_file |
Read file contents |
write_file |
Write/create files |
edit_file |
Find-and-replace edits |
list_dir |
List directory contents |
web_fetch |
Fetch URL content |
web_search |
Search the web (Brave API) |
cron |
Manage scheduled tasks |
CLI Reference
| Command | Description |
|---|---|
databot onboard |
Initialize config and workspace |
databot agent -m "..." |
Send a single message |
databot agent |
Interactive chat mode |
databot gateway |
Start always-on service (API + channels + cron) |
databot status |
Show status and configuration |
databot cron list |
List scheduled jobs |
databot cron add --name "..." --schedule "..." --message "..." |
Add a cron job |
databot cron remove --id "..." |
Remove a cron job |
Docker
# Build
docker build -t databot .
# Initialize (first time)
docker run -v ~/.databot:/root/.databot --rm databot onboard
# Run gateway
docker run -v ~/.databot:/root/.databot -p 18790:18790 databot gateway
Kubernetes
See the k8s/ directory for example Kubernetes deployment manifests.
Security
- Read-only SQL by default: Write operations blocked unless explicitly enabled
- Workspace sandboxing: Filesystem and shell restricted to workspace directory
- Command allowlist: Only whitelisted shell commands can execute
Plugins
Databot supports plugins via Python entry points. Third-party packages can add custom tools, channels, and LLM providers.
Creating a Plugin
- Create a Python package with your custom tool:
# my_databot_plugin/tools.py
from databot.tools.base import BaseTool
class MyCustomTool(BaseTool):
@property
def name(self) -> str:
return "my_tool"
@property
def description(self) -> str:
return "Description of what this tool does"
def parameters(self):
return {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "First parameter"}
},
"required": ["param1"]
}
async def execute(self, param1: str) -> str:
return f"Executed with {param1}"
- Register it in your
pyproject.toml:
[project.entry-points."databot.tools"]
my_tool = "my_databot_plugin.tools:MyCustomTool"
- Install your package and databot will auto-discover it.
Entry Point Groups
| Group | Base Class | Description |
|---|---|---|
databot.tools |
BaseTool |
Custom tools for the agent |
databot.channels |
BaseChannel |
Messaging integrations (Slack, Discord, etc.) |
databot.providers |
LLMProvider |
LLM provider adapters |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databot_ai-0.1.0.tar.gz.
File metadata
- Download URL: databot_ai-0.1.0.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
937df9926c76ba2bf99d773789f22443bc159b1e64b44239d909bf42dd793fed
|
|
| MD5 |
2c72dee01a8f49c3dfd2589b762bf253
|
|
| BLAKE2b-256 |
2049f002547171dd8bec35e30c390a852dfa68afc3a8849795c9ce368a2ea19d
|
Provenance
The following attestation bundles were made for databot_ai-0.1.0.tar.gz:
Publisher:
publish.yml on asb108/databot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databot_ai-0.1.0.tar.gz -
Subject digest:
937df9926c76ba2bf99d773789f22443bc159b1e64b44239d909bf42dd793fed - Sigstore transparency entry: 924236820
- Sigstore integration time:
-
Permalink:
asb108/databot@e13cdc47dc80affd5e9895c67e22adda645a12d2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/asb108
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e13cdc47dc80affd5e9895c67e22adda645a12d2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file databot_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: databot_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d1eb56fec5a0980028da6f0d05203ad2c9014a0af4cc2bb887f9730777a95a2
|
|
| MD5 |
4648cf21a15782c318864d986520aad4
|
|
| BLAKE2b-256 |
66b6cb20fad2474a8cbee76fdbb2b9fbd17435cd218cd3623800e38ed52cfe0a
|
Provenance
The following attestation bundles were made for databot_ai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on asb108/databot
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
databot_ai-0.1.0-py3-none-any.whl -
Subject digest:
8d1eb56fec5a0980028da6f0d05203ad2c9014a0af4cc2bb887f9730777a95a2 - Sigstore transparency entry: 924236855
- Sigstore integration time:
-
Permalink:
asb108/databot@e13cdc47dc80affd5e9895c67e22adda645a12d2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/asb108
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e13cdc47dc80affd5e9895c67e22adda645a12d2 -
Trigger Event:
release
-
Statement type: