🦕 The agent-first database
Project description
🦕 Dinobase
The data platform for agents.
Dinobase syncs 100+ sources — APIs, databases, files, MCP servers — annotates your data, and makes it SQL-ready for agents.
⭐️ star this repo! Thank you for your support!
Your agents are flying blind. Agent stacks built on per-connector tool calls have a structural gap: agents can't JOIN across APIs, have no semantic context to interpret field values, and receive paginated JSON that fills context windows. Take the question "Which customers churned last quarter with declining usage AND open support tickets?" — it spans three connectors and agents built on tool calls can't answer it reliably. This isn't a model problem. It's an architecture problem.
Dinobase is the data platform that fills it. Plug in every source: each connector (SaaS APIs, databases, files, MCP servers) becomes a schema. Agents write one SQL query across all connectors, write data back via SQL mutations with a preview/confirm flow, and get back a single result set. In benchmarks across 11 LLMs: 91% accuracy vs 35%, 3x faster, 16-22x cheaper per correct answer.
Quick start
# recommended — installs everything automatically
curl -fsSL https://dinobase.ai/install.sh | bash
# or with uv
uv tool install dinobase
# or with pip
pip install dinobase
# or with pipx
pipx install dinobase
1. Connect your data
dinobase add stripe --api-key sk_test_...
dinobase add hubspot --api-key pat-...
dinobase add linear --api-key lin_api_...
dinobase sync
# Or parquet files (no sync needed)
dinobase add parquet --path ./data/events/ --name analytics
# Or databases
dinobase add postgres --connection-string postgresql://...
# Or any MCP server — auto-discovers read-only tools and syncs them as SQL tables
dinobase connector create posthog_mcp --transport stdio \
--command "npx -y @posthog/mcp-server"
dinobase sync posthog_mcp
dinobase query "SELECT * FROM posthog_mcp.list_projects LIMIT 10"
2. Pick your agent interface
One-liner — installs Dinobase and configures your agent in one step:
curl -fsSL https://dinobase.ai/install.sh | bash -s -- claude-code
curl -fsSL https://dinobase.ai/install.sh | bash -s -- claude-desktop
curl -fsSL https://dinobase.ai/install.sh | bash -s -- cursor
curl -fsSL https://dinobase.ai/install.sh | bash -s -- codex
Already have Dinobase? Run the install subcommand directly:
|
CLI — for Claude Code, Cursor, Codex, Aider, any agent that runs shell dinobase install claude-code # Claude Code (~/.claude/CLAUDE.md)
dinobase install cursor # Cursor (./AGENTS.md)
dinobase install codex # Codex (~/.codex/AGENTS.md)
Writes usage instructions to the tool's instructions file. Agents run |
MCP server — for Claude Desktop, any MCP client dinobase install claude-desktop # Claude Desktop (writes config automatically)
dinobase serve # any other MCP client
|
3. Ask your agent a cross-connector question
"Which companies have closed-won deals over $100K but their subscription is past due?"
The agent writes the SQL, Dinobase executes it across your connectors, and the answer comes back in seconds.
4. Write data back (reverse ETL)
Agents can also mutate upstream data via SQL. Every mutation goes through a preview/confirm flow — nothing executes until confirmed.
dinobase query "UPDATE stripe.customers SET name = 'Acme Inc' WHERE id = 'cus_123'"
# Returns a preview: 1 row affected, will call Stripe API
dinobase confirm <mutation_id>
# ✓ Stripe API called (1/1 succeeded)
# ✓ Data updated
5. Use MCP servers as connectors — and call their tools directly
Connect any MCP server as a connector. Dinobase auto-discovers read-only tools and syncs them as SQL tables. Query with SQL for reads, call tools directly for writes or parameterized operations:
# Connect a server
dinobase connector create posthog_mcp --transport stdio \
--command "npx -y @posthog/mcp-server"
dinobase sync posthog_mcp
# Query synced data as SQL
dinobase query "SELECT name, active FROM posthog_mcp.list_feature_flags"
# Browse and call tools directly
dinobase mcp servers --pretty
dinobase mcp search "dashboard"
dinobase mcp call posthog_mcp.dashboard-get '{"id": 1118504}'
Or call tools from Python:
from dinobase.mcp import call, search, servers
result = call("posthog_mcp.dashboard-get", id=1118504)
matches = search("feature flag")
Agents can also run Python code via the exec_code MCP tool — chain multiple tool calls, reshape data, or do anything that plain SQL can't:
# exec_code: chain tool calls and build dynamic queries
from dinobase.mcp import call, search, servers
# discover available tools
flags = call("posthog_mcp.list-feature-flags")
# chain calls with logic
for flag in flags:
if not flag["active"]:
call("posthog_mcp.update-feature-flag", id=flag["id"], active=True)
result = {"reactivated": len([f for f in flags if not f["active"]])}
exec_code has full access to dinobase.mcp (call, search, servers, tools, instructions) and dinobase.db for direct database access. Assign your return value to result. See exec_code docs.
6. (Optional) Enable the semantic layer
export ANTHROPIC_API_KEY=sk-ant-...
After every sync, Dinobase automatically runs a Claude agent in the background to annotate your data — table descriptions, column docs, PII flags, and relationship graphs. Agents can then describe any table and get full semantic context.
dinobase describe stripe.subscriptions --pretty
# stripe.subscriptions (1,420 rows)
# Description: Active and historical customer subscriptions
#
# customer_id VARCHAR -- References customers.id
# status VARCHAR -- Values: active, past_due, canceled, trialing
# ...
# Related tables:
# stripe.customers (customer_id → id, many_to_one)
Set DINOBASE_AUTO_ANNOTATE=false to disable. See Semantic Layer docs.
Benchmark
We tested Dinobase SQL against per-connector MCP tools across 11 LLMs on 75 questions (same models, same data, same questions):
| Metric | Dinobase (SQL) | Per-Connector MCP |
|---|---|---|
| Accuracy | 91% | 35% |
| Avg latency | 34s | 106s |
| Cost per correct answer | $0.027 | $0.445 |
56pp more accurate, 3x faster, 16-22x cheaper per correct answer — across every model tested.
See benchmarks/ for full results, per-model breakdown, and methodology.
Connectors
101 connectors across every category. Run dinobase sources --available --pretty to list all.
| Category | Connectors |
|---|---|
| CRM & Sales | Salesforce, HubSpot, Pipedrive, Attio, Close, Copper |
| Billing & Payments | Stripe, Paddle, Chargebee, Recurly, Lemon Squeezy |
| Support & Success | Zendesk, Intercom, Freshdesk, HelpScout, Customer.io, Vitally, Gainsight |
| Developer Tools | GitHub, GitLab, Jira, Bitbucket, Sentry, Linear |
| Communication | Slack, Discord, Twilio, SendGrid, Mailchimp, Front |
| E-commerce | Shopify, WooCommerce, BigCommerce, Square |
| Marketing & Analytics | Google Analytics, Google Ads, Facebook Ads, HubSpot Marketing, Mixpanel, PostHog, Segment, Plausible, Matomo, Bing Webmaster |
| HR & Recruiting | Personio, BambooHR, Greenhouse, Lever, Workable, Gusto, Deel |
| Project Management | Asana, ClickUp, Monday, Trello, Todoist |
| Databases | Postgres, MySQL, MariaDB, SQL Server, Oracle, SQLite, Snowflake, BigQuery, Redshift, ClickHouse, CockroachDB, Databricks, Trino, Presto, DuckDB, MongoDB |
| Streaming | Kafka, Kinesis |
| Cloud Storage | S3, GCS, Azure Blob, SFTP |
| Finance | QuickBooks, Xero, Brex, Mercury |
| Productivity | Notion, Airtable, Google Sheets |
| Infrastructure | Datadog, New Relic, PagerDuty, OpsGenie, Statuspage, Cloudflare, Vercel, Netlify |
| Content & CMS | Strapi, Contentful, Sanity, WordPress |
| Design & Video | Figma, Mux |
| Files | Parquet, CSV (local or S3 — read at query time, no sync needed) |
| MCP servers | Any MCP server (stdio, SSE, HTTP) — auto-discovers read-only tools, syncs as SQL tables |
How it works
Agent (Claude, GPT, etc.)
|
+---------+---------+
| |
MCP Server CLI
(tool calls) (bash commands)
| |
+---------+---------+
|
Query Engine
(DuckDB SQL)
|
+------------+------------+
| | |
crm.* billing.* analytics.*
(synced) (synced) (parquet views)
Each connector becomes a schema. Cross-connector joins work via shared columns like email. Data stays in parquet — DuckDB is the query engine and metadata store.
API connectors sync to parquet in ~/.dinobase/data/ (or cloud storage). File connectors are read directly via DuckDB views — nothing is copied.
Cloud storage
Store data in S3, GCS, or Azure instead of local disk:
dinobase init --storage s3://my-bucket/dinobase/
# or
export DINOBASE_STORAGE_URL=s3://my-bucket/dinobase/
Supports Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compatible services (MinIO, R2). See Cloud Storage docs.
Integrations
Works with every major agent framework: CrewAI · LangChain / LangGraph · LlamaIndex · Pydantic AI · Vercel AI SDK · Mastra · OpenClaw
Documentation
- Getting Started — Install, connect, and query in 5 minutes
- Connectors — Credentials, naming, sync intervals
- Querying Data — Cross-connector joins, aggregations, DuckDB SQL
- Reverse ETL (Mutations) — Write data back to upstream APIs
- MCP Integration — Agent setup for Claude Desktop, Cursor
- Cloud Storage Backend — Store data in S3, GCS, or Azure
- Schema Annotations — How agents understand the data
- CLI Reference — All commands and flags
- Architecture — DuckDB, dlt, MCP, module structure
Community
Questions, feedback, or want to share what you're building? Come hang out:
- Join our Slack — chat with the team and other users
- Report an issue — bugs and feature requests
Development
git clone https://github.com/DinobaseHQ/dinobase
pip install -e ".[dev]"
pytest
License
MIT Expat
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dinobase-0.2.8.tar.gz.
File metadata
- Download URL: dinobase-0.2.8.tar.gz
- Upload date:
- Size: 23.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27f37019ba03f06653de423a7bd579f10aceba2bfa59209ebb79814e99af6cb8
|
|
| MD5 |
0f196728f57f72adf97bfdf13d5cc504
|
|
| BLAKE2b-256 |
954a38a91bb984d53fea3e55be99833776f0b551111ce5207106b9d26ff3fb11
|
Provenance
The following attestation bundles were made for dinobase-0.2.8.tar.gz:
Publisher:
release.yml on DinobaseHQ/dinobase
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dinobase-0.2.8.tar.gz -
Subject digest:
27f37019ba03f06653de423a7bd579f10aceba2bfa59209ebb79814e99af6cb8 - Sigstore transparency entry: 1413849488
- Sigstore integration time:
-
Permalink:
DinobaseHQ/dinobase@909ff259129aab2bec43642055e913737b9029e7 -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/DinobaseHQ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@909ff259129aab2bec43642055e913737b9029e7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dinobase-0.2.8-py3-none-any.whl.
File metadata
- Download URL: dinobase-0.2.8-py3-none-any.whl
- Upload date:
- Size: 465.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86a12e97ff60e658f6c10e8b19d692f095748e6cf33e042377d1edfe9a60a99a
|
|
| MD5 |
54ed014edc96a0ca84d2ac9d08380fe9
|
|
| BLAKE2b-256 |
f918581ae5bd9800cc61ea104e35366220e2e75e785c1f30ce051a9667b24d07
|
Provenance
The following attestation bundles were made for dinobase-0.2.8-py3-none-any.whl:
Publisher:
release.yml on DinobaseHQ/dinobase
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dinobase-0.2.8-py3-none-any.whl -
Subject digest:
86a12e97ff60e658f6c10e8b19d692f095748e6cf33e042377d1edfe9a60a99a - Sigstore transparency entry: 1413849578
- Sigstore integration time:
-
Permalink:
DinobaseHQ/dinobase@909ff259129aab2bec43642055e913737b9029e7 -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/DinobaseHQ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@909ff259129aab2bec43642055e913737b9029e7 -
Trigger Event:
push
-
Statement type: