Skip to main content

Generate NoSQL queries from natural language

Project description

๐Ÿฅญ Mango โ€” MongoDB AI Agent

Natural language โ†’ MQL โ†’ Answers. The open-source AI agent for MongoDB.

Python PyPI License Docs Open in Colab

Ask your MongoDB database anything in plain language. Mango translates your question into an MQL query, runs it, and gives you a clear answer โ€” with memory that improves over time.

Mango demo

๐Ÿ“– Full documentation

Install

pip install mango-ai[anthropic]   # Claude
pip install mango-ai[openai]      # GPT
pip install mango-ai[gemini]      # Gemini
pip install mango-ai[ollama]      # Ollama
pip install mango-ai[all]         # all providers

Quickstart

from mango import MangoAgent
from mango.tools import (
    ToolRegistry,
    ListCollectionsTool,
    SearchCollectionsTool,
    DescribeCollectionTool,
    CollectionStatsTool,
    RunMQLTool,
    SearchSavedCorrectToolUsesTool,
    SaveTextMemoryTool,
    DeleteLastMemoryEntryTool,
)
from mango.servers.fastapi import MangoFastAPIServer
from mango.integrations.anthropic import AnthropicLlmService
from mango.integrations.mongodb import MongoRunner
from mango.integrations.chromadb import ChromaAgentMemory

# Configure your LLM
llm = AnthropicLlmService(
    model="claude-sonnet-4-6",
    api_key="YOUR_API_KEY",
)

# Configure your database
db = MongoRunner()
db.connect("mongodb://localhost:27017/mydb")

# Configure your agent memory
agent_memory = ChromaAgentMemory(
    persist_dir="./chroma_db",
)

# Register tools
tools = ToolRegistry()
tools.register(ListCollectionsTool(db))
tools.register(SearchCollectionsTool(db))
tools.register(DescribeCollectionTool(db))
tools.register(CollectionStatsTool(db))
tools.register(RunMQLTool(db))
tools.register(SearchSavedCorrectToolUsesTool(agent_memory))
tools.register(SaveTextMemoryTool(agent_memory))

# Create your agent
agent = MangoAgent(
    llm_service=llm,
    tool_registry=tools,
    db=db,
    agent_memory=agent_memory,
    introspect=False
)

# Wire up the delete tool AFTER agent creation so it can reference agent state
tools.register(DeleteLastMemoryEntryTool(agent_memory, lambda: agent._last_memory_entry_id))

# Run the server
server = MangoFastAPIServer(agent)
server.run()  # http://localhost:8000

Your endpoint is live at POST /api/v1/ask/stream โ€” ready to connect to any frontend.


How It Works

User question
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚               MANGO AGENT                    โ”‚
โ”‚                                              โ”‚
โ”‚  1. Inject training examples (gold-standard) โ”‚
โ”‚  2. Search memory for similar past queries   โ”‚
โ”‚  3. Build system prompt with schema context  โ”‚
โ”‚  4. LLM decides which tools to call          โ”‚
โ”‚  5. Validate MQL before execution            โ”‚
โ”‚  6. Execute tools against MongoDB            โ”‚
โ”‚  7. Auto-retry on fixable errors (max 2x)    โ”‚
โ”‚  8. Stream natural language answer           โ”‚
โ”‚  9. Auto-save successful queries to memory   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ–ผ
SSE stream โ†’ your frontend

The learning loop: steps 1, 2, and 9 make Mango smarter over time. Training examples are injected first โ€” the LLM uses them directly without exploring the schema, which cuts latency and improves accuracy. Auto-saved examples accumulate during use. A novel question triggers full reasoning; a familiar one is answered in one shot.


Why Mango?

Vanna.ai has solved text-to-SQL elegantly. MongoDB has no equivalent. The challenges are fundamentally different:

  • No explicit schema โ€” collections have no DDL. Mango infers schema by sampling documents.
  • Nested documents โ€” queries must navigate arrays, subdocuments, and dotted paths.
  • Aggregation pipelines โ€” complex analytics require multi-stage JSON pipelines, not flat SQL strings.
  • No JOINs โ€” relationships are handled via $lookup or application-level references.

Mango is the first production-grade framework for natural language interaction with MongoDB.


SSE Streaming API

Every question is streamed via Server-Sent Events. Each data: line is a JSON event:

POST /api/v1/ask/stream
{"question": "How many orders were placed last week?"}
data: {"type": "session",     "session_id": "abc123"}
data: {"type": "tool_call",   "tool_name": "list_collections", "tool_args": {}}
data: {"type": "tool_result", "tool_name": "list_collections", "success": true, "preview": "orders, customers, products..."}
data: {"type": "tool_call",   "tool_name": "run_mql", "tool_args": {"operation": "aggregate", "collection": "orders", ...}}
data: {"type": "tool_result", "tool_name": "run_mql", "success": true, "preview": "[{\"total\": 1247}]"}
data: {"type": "answer",      "text": "1,247 orders were placed in the last 7 days."}
data: {"type": "done",        "iterations": 2, "input_tokens": 1820, "output_tokens": 94}

Multi-turn conversations are supported โ€” pass the same session_id to continue a thread.


Agent Memory

Mango learns from every successful interaction using ChromaDB as a vector store. Memory is organized in three layers:

Layer How it's populated Priority
Training You load verified examples explicitly Injected first โ€” LLM uses directly
Auto-save Populated automatically during use Injected as few-shot context
Text notes You add domain knowledge manually Retrieved when relevant

Load training data (bulk, from a JSONL file):

mango train --file knowledge.jsonl

Pre-load domain knowledge:

await memory.save_text("'active customer' means placed an order in the last 90 days")
await memory.save_text("'revenue' always refers to the total_amount field in orders")
await memory.save_text("the 'status' field uses: 1=pending, 2=shipped, 3=delivered, 4=cancelled")

Correct mistakes in chat โ€” no UI needed:

User: that query was wrong, delete it
Mango: Done โ€” removed from memory โœ“

โ†’ Requires DeleteLastMemoryEntryTool registered after agent creation (see Quickstart).


Available Tools

Tool Description
list_collections List all collections. Grouped view for large databases (100+ collections).
search_collections Search collections by name pattern (supports glob: order*, *_log).
describe_collection Full schema for a collection: field types, frequencies, indexes, references.
collection_stats Document count and storage size for a collection.
run_mql Execute a read-only MongoDB query: find, aggregate, count, distinct. Includes automatic MQL validation before execution.
search_saved_correct_tool_uses Search memory for similar past interactions.
save_text_memory Save free-form knowledge about the database for future queries.
explain_query (opt-in) Explain a query step-by-step in plain language + MongoDB execution stats.
delete_last_memory_entry (opt-in) Remove the last auto-saved entry when the user says a result was wrong.

Read-only by design. run_mql only accepts find, aggregate, count, distinct. Write operations are rejected at the tool level.

MQL validation. Every run_mql call is validated before hitting the database โ€” collection names, field names, and operators are checked against the live schema. Errors come back with hints (did you mean 'order_total'?) so the LLM can self-correct.


Pluggable Architecture

Mango is built on abstract interfaces โ€” swap any component without touching your agent code.

LLM Providers

from mango.integrations.anthropic import AnthropicLlmService
from mango.integrations.openai import OpenAILlmService
from mango.integrations.google import GeminiLlmService
from mango.integrations.ollama import OllamaLlmService

llm = AnthropicLlmService(model="claude-sonnet-4-6")
llm = OpenAILlmService(model="gpt-5.4")
llm = GeminiLlmService(model="gemini-3.1-pro-preview")
llm = OllamaLlmService(model="qwen3.5:9b")  # fully local, no API key needed

Custom Tools

from mango.tools.base import Tool, ToolResult
from mango.llm import ToolDef, ToolParam

class MyCustomTool(Tool):
    @property
    def definition(self) -> ToolDef:
        return ToolDef(
            name="my_tool",
            description="Does something useful",
            params=[ToolParam(name="input", type="string", description="...")]
        )

    async def execute(self, **kwargs) -> ToolResult:
        result = do_something(kwargs["input"])
        return ToolResult(success=True, data=result)

tools.register(MyCustomTool())

Memory Backends

# Default: ChromaDB (local, no infrastructure needed)
from mango.integrations.chromadb import ChromaAgentMemory
memory = ChromaAgentMemory(persist_dir="./mango_memory")

# Implement your own: just inherit MemoryService
class MyPineconeMemory(MemoryService):
    async def store(self, entry: MemoryEntry) -> None: ...
    async def retrieve(self, question: str, top_k: int) -> list[MemoryEntry]: ...
    async def save_text(self, content: str) -> str: ...

Large Databases

Mango handles databases with hundreds or thousands of collections without token explosion.

  • Adaptive collection listing โ€” databases with 100+ collections are automatically grouped by name pattern (contest_*, user_*) instead of listing every single one
  • On-demand schema โ€” schema details are only fetched when the LLM needs them, not injected upfront
  • Turn-based conversation pruning โ€” conversation history is automatically trimmed to keep token usage stable across long sessions
  • Auto-save to memory โ€” schema discoveries are persisted so the LLM doesn't re-introspect the same collections repeatedly

Multi-turn Conversations

Mango maintains conversation history across questions in the same session:

User: How many orders were placed last week?
Mango: 1,247 orders were placed in the last 7 days.

User: And how many of those were delivered?
Mango: Of last week's 1,247 orders, 891 (71%) have been delivered.

User: Which customer placed the most?
Mango: Alice Johnson (customer_id: 64a3f...) placed 8 orders last week.

Follow-up questions work naturally โ€” no need to repeat context.


Technology Stack

Component Library
MongoDB driver pymongo 4.x
LLM (primary) anthropic (Claude)
LLM (secondary) openai (GPT-4), google-generativeai (Gemini), ollama (local)
Vector store chromadb 1.x
Server FastAPI + uvicorn
CLI rich + prompt_toolkit
Data pandas
Testing pytest + mongomock

Roadmap

Foundation

  • MongoDB backend with full schema introspection
  • Pluggable LLM providers (Anthropic, OpenAI, Gemini)
  • RunMQLTool โ€” read-only find, aggregate, count, distinct
  • Streaming FastAPI server (SSE)
  • CLI interface

Memory & Learning

  • ChromaDB agent memory with auto-save
  • SaveTextMemoryTool โ€” business glossary and domain knowledge
  • Adaptive collection grouping for large databases
  • Multi-turn conversation with automatic pruning
  • Memory export/import (JSON)
  • mango train โ€” bulk training data pre-loading from JSONL
  • DeleteLastMemoryEntryTool โ€” in-chat memory correction

Polish

  • MQLValidator โ€” pre-execution validation with field/operator hints
  • ExplainQueryTool โ€” pipeline explanation in plain language + execution stats
  • Retry-with-error flow (max 2 retries on fixable query errors)
  • Ollama integration โ€” fully local inference, no API key
  • VisualizeDataTool โ€” charts and tables in CLI

Expansion

  • Atlas Vector Search memory backend
  • Redis backend (experimental)
  • Cassandra backend (experimental)
  • DynamoDB backend (experimental)
  • Memory analytics (most common queries, accuracy tracking)

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup, code standards, and PR process.


License

MIT License โ€” see LICENSE for details.


Built with โค๏ธ by Francesco Bellingeri | Inspired by Vanna.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mango_ai-0.2.0.tar.gz (56.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mango_ai-0.2.0-py3-none-any.whl (64.7 kB view details)

Uploaded Python 3

File details

Details for the file mango_ai-0.2.0.tar.gz.

File metadata

  • Download URL: mango_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 56.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mango_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f2071bf7ceebb1cfa7074651bf2f371b0ae41015af40c493585dffb7e33ff9c4
MD5 aa079d9efc152b089a8f3965b6574617
BLAKE2b-256 19ef3ed3eee93caa22d24e8e42ccf2113d86c69387036ebab7fea48bae97d244

See more details on using hashes here.

File details

Details for the file mango_ai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mango_ai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mango_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6b4ce0e36ab7ea68c6cb0324343bc61dd13ede38e8f20104cee72ec65646f92
MD5 fe727d0e35ad7f0b36463007b981deaa
BLAKE2b-256 45af571c051a8bcd38cf576bd0670fec69e020d0039b0068104464cc0eeda0e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page