datahub-analytics-agent

Open-source talk-to-data agent with DataHub context and pluggable query engines

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shirshanka

These details have not been verified by PyPI

Project description

Analytics Agent

Natural-language data queries, powered by DataHub + LangGraph
Ask a question. Get SQL, results, and a chart — in one turn.

Python FastAPI LangGraph React DataHub

⚡ Quickstart

Option A — pip / uvx (recommended, no Docker needed)

# Install and launch — no git clone, no repo, no Docker
pip install datahub-analytics-agent
analytics-agent quickstart

# Or with uv (no virtualenv management):
uvx datahub-analytics-agent quickstart

The wizard asks for your LLM provider + API key, optionally connects a data source, then starts the agent at http://localhost:8100. Config and the database are stored in ~/.datahub/analytics-agent/.

Re-running analytics-agent quickstart detects the existing config and offers to start, reconfigure, or cancel — so it doubles as the "just start the agent" command for repeat users.

Other server commands:

analytics-agent start    # start from existing config (no wizard)
analytics-agent stop     # stop the running server
analytics-agent status   # show whether server is running + URL
analytics-agent logs     # tail ~/.datahub/analytics-agent/logs/agent.log
analytics-agent config   # open config dir in $EDITOR or print its path

Option B — Docker + sample data (full demo)

Requires: Docker, DataHub CLI (pip install acryl-datahub), uv, Python 3.11+

git clone https://github.com/datahub-project/analytics-agent.git
cd analytics-agent
bash quickstart.sh

The script starts a local DataHub instance, loads the Olist e-commerce sample dataset and catalog metadata, then builds and launches Analytics Agent at http://localhost:8100. Postgres data is persisted to ~/.datahub/analytics-agent/postgres-data/ so it survives container restarts.

Using AWS Bedrock? Export LLM_PROVIDER=bedrock before running the script. The script will verify your AWS credentials and Bedrock access before starting the container, and mount ~/.aws read-only so boto3 picks up your profiles and SSO cache automatically.

What it looks like

Analytics Agent chat with chart and context quality bar

Analytics Agent welcome screen with conversation history

What it does


Plain-English → SQL → Chart	Ask "top 5 categories by revenue" — the agent searches DataHub docs first, writes SQL, runs it, and auto-renders a Vega-Lite chart.
Context Quality	A live status bar shows how well your DataHub catalog supported the agent (1–5). Hover for the LLM's reasoning. Improves as you document your data.
`/improve-context`	Type `/improve-context` after any conversation to get a numbered list of documentation improvements the agent wishes it had — then approve and publish them to DataHub in one click.
Multi-turn memory	Follow-ups like "make it a pie chart" or "filter to Q3" work across turns.
Collapsible reasoning	Tool calls and agent thinking are shown but collapsed — visible when you want them, out of the way when you don't.
4 themes	DataHub (light/purple), Warm (light/orange), Ocean (dark/blue), Carbon (dark/gray). Switcher in the bottom-left.
Multiple connections	Add and manage multiple Snowflake, BigQuery, DuckDB, or SQLAlchemy connections from Settings. Each has its own encrypted credentials.

Manual setup (for contributors / development)

This section is for hacking on the agent itself. For everyday use, analytics-agent quickstart is simpler.

Also requires: node

1. Clone and install

git clone https://github.com/datahub-project/analytics-agent.git
cd analytics-agent
make install   # uv sync + pnpm install
make start     # builds frontend, starts backend at :8100

Open http://localhost:8100 — a setup wizard handles the LLM key and connections on first run.

Without make: uv sync && cd frontend && pnpm install && pnpm build && cd .. && uv run uvicorn analytics_agent.main:app --port 8100

First-time setup

Before the first uvicorn start (or after pulling a release that adds migrations), run:

uv run analytics-agent bootstrap

This applies Alembic migrations, seeds engines and context platforms from config.yaml, and writes first-run setting defaults. The command is idempotent — re-running it on an up-to-date database is a no-op.

For Kubernetes deployments, the Helm chart runs analytics-agent bootstrap automatically as a pre-install/pre-upgrade hook (see helm/analytics-agent/README.md).

Optional: pre-configure via `.env`

cp .env.example .env   # then edit as needed

# LLM — pick one provider (or leave blank and use the wizard)
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...

# DataHub (optional — can also be added via Settings → Connections)
DATAHUB_GMS_URL=https://your-instance.acryl.io/gms
DATAHUB_GMS_TOKEN=eyJhbGci...

Useful make targets

Command	What it does
`make start`	Build frontend if stale, start backend
`make start-remote`	Start + show DataHub connection status
`make nuke`	Wipe the DB and start from scratch
`make dev`	Hot-reload backend (use `make dev-full` for frontend HMR too)
`make logs`	Tail backend logs

Development mode (hot reload)

# Terminal 1 — backend (dev)
uv run uvicorn analytics_agent.main:app --reload --port 8101

# Terminal 2 — frontend HMR (http://localhost:5173, proxies /api/* to :8101)
cd frontend && pnpm dev

Connect DataHub

# DataHub Cloud (Acryl)
datahub init --sso --host https://your-instance.acryl.io/gms --token-duration ONE_MONTH

# Self-hosted
datahub init --host http://localhost:8080 --username datahub --password datahub

# Verify the connection
curl -s -X POST http://localhost:8100/api/settings/connections/datahub/test

Connect Snowflake

Option A — Service account via `config.yaml` (recommended)

# config.yaml
engines:
  - type: snowflake
    name: snowflake
    connection:
      account: "${SNOWFLAKE_ACCOUNT}"
      warehouse: "${SNOWFLAKE_WAREHOUSE}"
      database: "${SNOWFLAKE_DATABASE}"
      schema: "${SNOWFLAKE_SCHEMA}"
      user: "${SNOWFLAKE_USER}"

Option B — Key-pair auth

Generate an RSA key pair, upload the public key to Snowflake, then set SNOWFLAKE_PRIVATE_KEY (base64-encoded PEM) in .env.

Option C — Personal SSO (Settings UI)

Settings → Connections → Authentication → SSO — opens a browser window for your IdP.

Connect BigQuery

BigQuery authenticates exclusively via a GCP service account. Three credential formats are supported — use whichever fits your deployment:

Option A — JSON key via environment variable (recommended for containers)

Export the raw service-account JSON (single line, no newlines):

export BIGQUERY_CREDENTIALS_JSON='{"type":"service_account","project_id":"my-project",...}'

Or add it to .env:

BIGQUERY_CREDENTIALS_JSON={"type":"service_account","project_id":"my-project",...}

Then reference the project in config.yaml:

# config.yaml
engines:
  - type: bigquery
    name: prod
    connection:
      project: "${BIGQUERY_PROJECT}"
      dataset: "${BIGQUERY_DATASET}"   # optional default dataset

Option B — Base64-encoded JSON key via `config.yaml`

Encode your key file once:

base64 -i my-service-account.json | tr -d '\n'

Then paste the output into config.yaml:

engines:
  - type: bigquery
    name: prod
    connection:
      project: my-gcp-project
      dataset: my_dataset          # optional
      credentials_base64: "ey..."

Option C — Path to a JSON key file

Useful for local development or when the key file is mounted into the container:

engines:
  - type: bigquery
    name: prod
    connection:
      project: my-gcp-project
      credentials_path: /secrets/sa-key.json

Required IAM roles

The service account needs at minimum:

Role	Purpose
`roles/bigquery.dataViewer`	Read tables and schemas
`roles/bigquery.jobUser`	Run queries

LLM model routing

Four independently configurable model tiers:

Task	Env var	Default (Anthropic)
Main analysis agent	`LLM_MODEL`	`claude-sonnet-4-6`
Chart generation	`CHART_LLM_MODEL`	`claude-haiku-4-5-20251001`
Context quality scoring	`QUALITY_LLM_MODEL`	`claude-haiku-4-5-20251001`
Titles & greeting	`DELIGHT_LLM_MODEL`	`claude-haiku-4-5-20251001`

LLM_PROVIDER=anthropic
LLM_MODEL=claude-opus-4-7          # upgrade just the agent
QUALITY_LLM_MODEL=claude-sonnet-4-6 # or use a stronger model for quality scoring

AWS Bedrock

Anthropic models can also be run via AWS Bedrock. Set LLM_PROVIDER=bedrock and use the Bedrock inference-profile model IDs (e.g. us.anthropic.claude-sonnet-4-5-20250929-v1:0). Auth falls back to the standard AWS credential chain (env vars, ~/.aws/credentials, IAM role); to override, set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (and optionally AWS_SESSION_TOKEN for STS). AWS_REGION defaults to us-west-2.

LLM_PROVIDER=bedrock
AWS_REGION=us-west-2
LLM_MODEL=us.anthropic.claude-sonnet-4-5-20250929-v1:0

Database

The analytics-agent quickstart path uses SQLite at ~/.datahub/analytics-agent/data/agent.db. The Docker quickstart uses Postgres, with data persisted to ~/.datahub/analytics-agent/postgres-data/. For dev/Helm deployments, set DATABASE_URL explicitly — see .env.example for Postgres and SQLite formats.

Settings UI

Settings (top-right) manages:

Connections — test, edit, add, and delete engine connections
Authentication — per-connection: Password, Private Key, SSO, PAT, OAuth
Tool toggles — enable/disable individual DataHub or engine tools
Write-back skills — publish_analysis and save_correction (enabled by default)
Prompt — customize the system prompt
Display — app name and logo

Production

Docker

docker build -f docker/Dockerfile -t analytics-agent .
docker run -p 8100:8100 --env-file .env analytics-agent

Single process (no Docker)

cd frontend && pnpm build && cd ..
uv run uvicorn analytics_agent.main:app --host 0.0.0.0 --port 8100

Architecture

analytics-agent/
├── backend/src/analytics_agent/
│   ├── agent/          # LangGraph ReAct graph, streaming, chart generation, analysis
│   ├── api/            # FastAPI routes: conversations, chat (SSE), settings, oauth
│   ├── context/        # DataHub tool loader (datahub_agent_context)
│   ├── db/             # SQLAlchemy models + Alembic migrations
│   │   └── models.py   # Conversation, Message, Integration, Setting
│   ├── engines/        # Pluggable query engines (Snowflake, BigQuery, DuckDB, SQLAlchemy)
│   ├── prompts/        # System prompt (system_prompt.md) + chart prompt
│   └── skills/         # Write-back skills: publish-analysis, save-correction,
│                       #   improve-context (/improve-context slash command)
└── frontend/src/
    ├── components/Chat/ # MessageList, MessageInput, ContextStatusBar
    ├── components/Settings/
    ├── api/             # fetch wrappers for REST + SSE stream reader
    └── store/           # Zustand: conversations, display, theme

SSE event flow:

User message → POST /api/conversations/{id}/messages
  → resolver.py resolves credentials → configured engine
  → LangGraph ReAct agent (DataHub tools + engine tools)
  → astream_events → TEXT / TOOL_CALL / TOOL_RESULT / SQL / CHART / COMPLETE
  → Frontend renders each event type inline
  → Background: context quality scored async, stored on conversation row

_{Built with DataHub · LangGraph · FastAPI · React}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shirshanka

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.3

May 5, 2026

0.2.2

May 5, 2026

This version

0.2.1

May 5, 2026

0.2.0

May 4, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahub_analytics_agent-0.2.1.tar.gz (580.3 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datahub_analytics_agent-0.2.1-py3-none-any.whl (142.2 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file datahub_analytics_agent-0.2.1.tar.gz.

File metadata

Download URL: datahub_analytics_agent-0.2.1.tar.gz
Upload date: May 5, 2026
Size: 580.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datahub_analytics_agent-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`45aef53531dcbc9a9a2d835ea2110485ef102b4d7723a18d2307e0b7b814ede9`
MD5	`daacf02867c55320fd68bc13ae0a0a43`
BLAKE2b-256	`b064132be17bbe688556c48a4d883dcfadc0dbb0a4f0accc3c500f433a0399c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datahub_analytics_agent-0.2.1.tar.gz:

Publisher: publish.yml on datahub-project/analytics-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datahub_analytics_agent-0.2.1.tar.gz
- Subject digest: 45aef53531dcbc9a9a2d835ea2110485ef102b4d7723a18d2307e0b7b814ede9
- Sigstore transparency entry: 1438933557
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: datahub-project/analytics-agent@cb613ae851c7117d089da1e33f3d6ddde58f797e
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/datahub-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cb613ae851c7117d089da1e33f3d6ddde58f797e
- Trigger Event: push

File details

Details for the file datahub_analytics_agent-0.2.1-py3-none-any.whl.

File metadata

Download URL: datahub_analytics_agent-0.2.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 142.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datahub_analytics_agent-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b31a7871a686a7a006e7ff1af5bda05c7e2ddc512cd7eff99244a0f380d1d8b3`
MD5	`c61b6e9424e986cc17a88f32dae6a9fc`
BLAKE2b-256	`e001e6c4409ff7745114456def88fdc646653ff5fa14c579cf4840a0c6f1f261`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datahub_analytics_agent-0.2.1-py3-none-any.whl:

Publisher: publish.yml on datahub-project/analytics-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datahub_analytics_agent-0.2.1-py3-none-any.whl
- Subject digest: b31a7871a686a7a006e7ff1af5bda05c7e2ddc512cd7eff99244a0f380d1d8b3
- Sigstore transparency entry: 1438933565
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: datahub-project/analytics-agent@cb613ae851c7117d089da1e33f3d6ddde58f797e
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/datahub-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cb613ae851c7117d089da1e33f3d6ddde58f797e
- Trigger Event: push

datahub-analytics-agent 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

⚡ Quickstart

Option A — pip / uvx (recommended, no Docker needed)

Option B — Docker + sample data (full demo)

What it looks like

What it does

Manual setup (for contributors / development)

1. Clone and install

First-time setup

Optional: pre-configure via .env

Useful make targets

Development mode (hot reload)

Connect DataHub

Connect Snowflake

Option A — Service account via config.yaml (recommended)

Option B — Key-pair auth

Option C — Personal SSO (Settings UI)

Connect BigQuery

Option A — JSON key via environment variable (recommended for containers)

Option B — Base64-encoded JSON key via config.yaml

Option C — Path to a JSON key file

Required IAM roles

LLM model routing

AWS Bedrock

Database

Settings UI

Production

Docker

Single process (no Docker)

Architecture

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Optional: pre-configure via `.env`

Option A — Service account via `config.yaml` (recommended)

Option B — Base64-encoded JSON key via `config.yaml`