Defog is a Python library that helps you generate data queries from natural language questions.
Project description
defog
A comprehensive Python toolkit for AI-powered data operations - from natural language SQL queries to multi-agent orchestration.
Features
- 🤖 Cross-provider LLM operations - Unified interface for OpenAI, Anthropic, Gemini, Grok (xAI), and Together AI
- 📊 SQL Agent - Convert natural language to SQL with automatic table filtering for large databases
- 🔍 Data extraction - Extract structured data from PDFs, images, HTML, text documents, and even images embedded in HTML
- 🛠️ Advanced AI tools - Code interpreter, web search, YouTube transcription, document citations
- 🎭 Agent orchestration - Hierarchical task delegation and multi-agent coordination
- 💾 Memory management - Automatic conversation compactification for long contexts
Installation
pip install --upgrade defog
Quick Start
1. LLM Chat (Cross-Provider)
from defog.llm.utils import chat_async
from defog.llm.llm_providers import LLMProvider
# Works with any provider
response = await chat_async(
provider=LLMProvider.ANTHROPIC, # or OPENAI, GEMINI
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content)
OpenAI GPT‑5: Responses API controls
from defog.llm.utils import chat_async
from defog.llm.llm_providers import LLMProvider
response = await chat_async(
provider=LLMProvider.OPENAI,
model="gpt-5.1",
messages=[
{"role": "system", "content": "You are concise and helpful."},
{"role": "user", "content": "Summarize the benefits of unit tests."},
],
# Optional Responses API controls for GPT‑5.1
reasoning_effort="none", # none | low | medium | high
verbosity="low", # low | medium | high
)
print(response.content)
2. Natural Language to SQL
from defog.llm.sql import sql_answer_tool
from defog.llm.llm_providers import LLMProvider
# Ask questions in natural language
result = await sql_answer_tool(
question="What are the top 10 customers by total sales?",
db_type="postgres",
db_creds={
"host": "localhost",
"database": "mydb",
"user": "postgres",
"password": "password",
"port": 5432
},
model="claude-sonnet-4-20250514",
provider=LLMProvider.ANTHROPIC
)
print(f"SQL: {result['query']}")
print(f"Results: {result['results']}")
3. Extract Data from PDFs
from defog.llm import extract_pdf_data
# Extract structured data from any PDF
data = await extract_pdf_data(
pdf_url="https://example.com/financial_report.pdf",
focus_areas=["revenue", "financial metrics"]
)
for datapoint_name, extracted_data in data["data"].items():
print(f"{datapoint_name}: {extracted_data}")
4. Code Interpreter
from defog.llm.code_interp import code_interpreter_tool
from defog.llm.llm_providers import LLMProvider
# Execute Python code with AI assistance
result = await code_interpreter_tool(
question="Analyze this data and create a visualization",
csv_string="name,sales\nAlice,100\nBob,150",
model="gpt-4o",
provider=LLMProvider.OPENAI
)
print(result["code"]) # Generated Python code
print(result["output"]) # Execution results
5. Using MCP Servers with chat_async
from defog.llm.utils import chat_async
from defog.llm.llm_providers import LLMProvider
# Use MCP servers for dynamic tool integration
# Works with both local and remote MCP servers
response = await chat_async(
provider=LLMProvider.OPENAI,
model="gpt-4.1",
mcp_servers=["http://localhost:8000/mcp"], # Can be local or remote
messages=[
{"role": "user", "content": "How many users are in the first table?"}
]
)
# MCP tools are automatically converted to Python functions
# and made available to the LLM
print(response.content)
6. Anthropic Server-Side Tools and Programmatic Tool Calling
chat_async exposes Anthropic's first-party server-side tools (web_search,
web_fetch, code_execution, advisor) and the new programmatic tool
calling flow, where Claude writes Python in the code execution sandbox that
calls your local tools as await my_tool(...) — keeping intermediate
results in the sandbox so they never re-enter the model context.
from pydantic import BaseModel
from defog.llm.utils import chat_async
# Server-side web_search
response = await chat_async(
provider="anthropic",
model="claude-opus-4-6",
messages=[{"role": "user", "content": "What's the latest defog-python release?"}],
server_tools=["web_search"],
)
print(response.content)
print(response.server_tool_outputs) # raw web_search_tool_result blocks
print(response.server_tool_usage) # {"web_search_requests": 1, ...}
# Programmatic tool calling: Claude calls your tool from inside code execution
class QueryArgs(BaseModel):
sql: str
async def query_database(input: QueryArgs) -> list:
"""Run a SQL query and return rows as JSON."""
return [{"customer": "Acme", "revenue": 50_000}]
response = await chat_async(
provider="anthropic",
model="claude-opus-4-6",
messages=[{"role": "user", "content": "Who is the top customer by revenue?"}],
tools=[query_database],
server_tools=["code_execution"],
programmatic_tool_calling=True,
)
print(response.content)
print(response.container_id) # reuse via `container_id=` on a follow-up call
See docs/llm/anthropic-server-tools.md
for the full reference, including version overrides for Bedrock/Vertex,
container reuse, and the LLMResponse shape additions.
Documentation
📚 Full Documentation - Comprehensive guides and API reference
Quick Links
- LLM Utilities - Chat, function calling, structured output, memory management
- Database Operations - SQL generation, query execution, schema documentation
- Data Extraction - PDF, image, and HTML data extraction tools
- Agent Orchestration - Multi-agent coordination and task delegation
- API Reference - Complete API documentation
Environment Variables
# API Keys
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GEMINI_API_KEY="your-gemini-key"
Advanced Use Cases
For advanced features like:
- Memory compactification for long conversations
- YouTube video transcription and summarization
- Multi-agent orchestration with shared context
- Database schema auto-documentation
- Model Context Protocol (MCP) support
See the full documentation.
Development
Testing and formatting
- Run tests:
python -m pytest tests - Format code:
ruff format - Update documentation when adding features
Using our MCP Server
- Run
defog serveonce to complete your setup, anddefog dbto update your database credentials - Add to your MCP Client
- Claude Code:
claude mcp add defog -- python3 -m defog.mcp_server. Or if you do not want to install the defog package globally or set up environment variables, runclaude mcp add dfg -- uv run --directory FULL_PATH_TO_VENV_DIRECTORY --env-file .env -m defog.mcp_server - Claude Desktop: add the config below
{ "mcpServers": { "defog": { "command": "python3", "args": ["-m", "defog.mcp_server"], "env": { "OPENAI_API_KEY": "YOUR_OPENAI_KEY", "ANTHROPIC_API_KEY": "YOUR_ANTHROPIC_KEY", "GEMINI_API_KEY": "YOUR_GEMINI_KEY", "DB_TYPE": "YOUR_DB_TYPE", "DB_HOST": "YOUR_DB_HOST", "DB_PORT": "YOUR_DB_PORT", "DB_USER": "YOUR_DB_USER", "DB_PASSWORD": "YOUR_DB_PASSWORD", "DB_NAME": "YOUR_DB_NAME" } } } }
- Claude Code:
Available MCP Tools and Resources
The Defog MCP server provides the following capabilities:
Tools (actions the AI can perform):
text_to_sql_tool- Execute natural language queries against your databaselist_database_schema- List all tables and their schemasyoutube_video_summary- Get transcript/summary of YouTube videos (requires Gemini API key)extract_pdf_data- Extract structured data from PDFsextract_html_data- Extract structured data from HTML pagesextract_text_data- Extract structured data from text files
Resources (read-only data the AI can access):
schema://tables- Get list of all tables in the databaseschema://table/{table_name}- Get detailed schema for a specific tablestats://table/{table_name}- Get statistics and metadata for a table (row count, column statistics)sample://table/{table_name}- Get sample data (10 rows) from a table
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file defog-1.5.3b2.tar.gz.
File metadata
- Download URL: defog-1.5.3b2.tar.gz
- Upload date:
- Size: 279.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0e89584cfbea9407be089f966e97f11521ae2ff137634001779d9e92bcaf6f4
|
|
| MD5 |
7af126e80b7744badbc791898bc4579c
|
|
| BLAKE2b-256 |
415777b2c84ed9524ef7309ee34711ca9a478762329d57c4db453d0bed41fa4c
|
File details
Details for the file defog-1.5.3b2-py3-none-any.whl.
File metadata
- Download URL: defog-1.5.3b2-py3-none-any.whl
- Upload date:
- Size: 223.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dde18601d6e1dbfd37766f83940f16ebf4cb82c342a7905a4e1849d77b8bb5aa
|
|
| MD5 |
f0042b8b2359fb03f2daa6658df669e1
|
|
| BLAKE2b-256 |
995d3aa555ac3d5474ec0e9f6439b188b34b3106505d248df395f94137e69bf7
|