Enterprise-ready Natural Language to SQL converter with multi-provider support. Built for production scale (1000+ tables) with Clean Architecture.
Project description
nlp2sql
DSL-first natural language to SQL for PostgreSQL and Redshift
nlp2sql turns a natural language question into SQL through a reusable Python DSL:
await connect(...)await nlp.ask(...)- optional few-shot examples
- optional semantic context
- optional validation and repair
The library is designed for both simple schemas and large warehouses, but all public examples in this repository use the local e-commerce domain shipped with the project itself.
Features
- DSL-first API:
connect()returns anNLP2SQLclient withask(),validate(),explain(), andsuggest() - Business-aware generation: optional
SemanticContextadds canonical tables, metrics, dimensions, rules, and mappings - Execution modes: generate only, generate plus validate, and generate plus validate plus repair
- Few-shot examples: pass example lists directly or use an example repository implementation
- Large schema support: FAISS plus TF-IDF hybrid retrieval, schema filters, and disk-backed caches
- Multiple providers: OpenAI, Anthropic, and Gemini
- Database support: PostgreSQL and Amazon Redshift
- Async by default: built for services, APIs, notebooks, and workers
Documentation
| Document | Description |
|---|---|
| Architecture | Runtime flow, services, ports, and diagrams |
| API Reference | Python API, CLI, hooks, and metadata reference |
| Configuration | Environment variables, examples, semantic context, cache behavior |
| Enterprise Guide | Governed usage, scale, and deployment patterns |
| Redshift Support | Redshift-specific notes using public examples |
| Examples | Safe public examples based on the local e-commerce domain |
| Contributing | Contribution guidelines |
Installation
# With UV (recommended)
uv add nlp2sql
# With pip
pip install nlp2sql
# With specific providers
pip install nlp2sql[anthropic,gemini]
pip install nlp2sql[all-providers]
# With embeddings
pip install nlp2sql[embeddings-local]
pip install nlp2sql[embeddings-openai]
Quick Start
1. Set a Provider Key
export OPENAI_API_KEY="your-openai-key"
# or ANTHROPIC_API_KEY / GOOGLE_API_KEY
2. Use the DSL
import asyncio
import nlp2sql
from nlp2sql import ProviderConfig
async def main():
nlp = await nlp2sql.connect(
"postgresql://testuser:testpass@localhost:5432/testdb",
provider=ProviderConfig(provider="openai", api_key="sk-..."),
)
result = await nlp.ask("Show active users by region")
print(result.sql)
print(result.confidence)
print(result.metadata["sql_intent_plan"])
asyncio.run(main())
connect() loads the schema, initializes retrieval indexes, and returns a reusable NLP2SQL client. ask() returns a typed QueryResult.
3. Add Few-Shot Examples
Pass examples directly to connect(). The library handles indexing for you.
nlp = await nlp2sql.connect(
"postgresql://testuser:testpass@localhost:5432/testdb",
provider=ProviderConfig(provider="openai", api_key="sk-..."),
examples=[
{
"question": "Show revenue by source category for the flagship store",
"sql": (
"SELECT d.metric_date, mc.source_category, SUM(d.revenue) AS revenue "
"FROM daily_channel_metrics d "
"JOIN stores s ON d.store_id = s.id "
"JOIN marketing_channels mc ON d.channel_id = mc.id "
"WHERE s.code = 'na_flagship' "
"GROUP BY d.metric_date, mc.source_category"
),
"database_type": "postgres",
}
],
)
4. Add In-Memory Semantic Context
Use semantic context when the same question could map to multiple plausible tables or dimensions.
from nlp2sql import (
DimensionDefinition,
DomainRule,
MetricDefinition,
SemanticContext,
SemanticEntityMapping,
)
semantic_context = SemanticContext(
domain="ecommerce_channel_performance",
canonical_tables=["daily_channel_metrics"],
required_filters=["s.code = 'na_flagship'", "s.region = 'North America'"],
entity_mappings=[
SemanticEntityMapping(
source_term="North America flagship store",
target="store_scope",
resolved_value="na_flagship / North America",
filter_expression="s.code = 'na_flagship' AND s.region = 'North America'",
)
],
metric_definitions=[
MetricDefinition(name="revenue", description="Revenue by day and source category."),
MetricDefinition(name="orders_count", description="Orders by day and source category."),
],
dimension_definitions=[
DimensionDefinition(name="metric_date", description="Daily grain."),
DimensionDefinition(name="source_category", description="Channel grouping."),
],
rules=[
DomainRule(
name="preserve_source_breakdown",
description="Keep source_category when the question asks for a source breakdown.",
required_dimensions=["source_category"],
preferred_tables=["daily_channel_metrics"],
)
],
)
result = await nlp.ask(
"Show daily revenue and order count by source category for the North America flagship store",
semantic_context=semantic_context,
)
5. Validate and Repair
ask() supports execution-aware modes directly.
result = await nlp.ask(
"Show revenue by source category for the flagship store",
validate=True,
repair=True,
)
generate_only: generate SQL onlygenerate_and_validate: execute readonly validation when execution is wiredgenerate_validate_repair: retry on semantic or execution failures when repair hooks are available
6. CLI Parity
The CLI exposes the same concepts:
nlp2sql query \
--database-url postgresql://testuser:testpass@localhost:5432/testdb \
--question "Show daily revenue by source category for the North America flagship store" \
--examples-file examples.json \
--semantic-context-file semantic-context.json \
--validate \
--repair \
--show-semantic-context \
--show-sql-intent-plan \
--show-selected-examples
How It Works
flowchart TD
userCode[UserCodeOrCLI] --> dsl[connectAndAskDSL]
dsl --> analysis[QueryAnalysisService]
analysis --> semantic[SemanticResolutionService]
semantic --> retrieval[SchemaRetrievalAndExampleSelection]
retrieval --> intent[SqlIntentPlanningService]
intent --> prompt[PromptAssemblyAndAdapters]
prompt --> llm[LLMGeneration]
llm --> semval[SemanticValidation]
semval --> exec[OptionalExecutionAndRepair]
exec --> result[QueryResultMetadata]
At runtime the library:
- analyzes the question
- optionally resolves and merges semantic context
- retrieves relevant schema and examples
- builds a structured SQL intent plan
- assembles the prompt
- generates SQL
- optionally validates, executes, and repairs
- returns a
QueryResultwith debug metadata
See Architecture for the full breakdown.
Public Example Domain
This repository ships a local e-commerce integration domain used in tests and docs. It includes:
storesmarketing_channelsusersproductsordersorder_itemsdaily_channel_metrics
The public examples intentionally stay inside that domain to avoid leaking any private warehouse schema.
To start it locally:
cd docker
docker compose up -d postgres
The default URL is:
postgresql://testuser:testpass@localhost:5432/testdb
Provider Comparison
| Provider | Default Model | Context Size | Best For |
|---|---|---|---|
| OpenAI | gpt-4o-mini |
128K | Fast general purpose usage |
| Anthropic | claude-sonnet-4-20250514 |
200K | Larger schemas and long prompts |
| Gemini | gemini-2.0-flash |
1M | High-volume and very large contexts |
All models are configurable through ProviderConfig.
Lower-Level API
connect() is the recommended path. Lower-level entry points still exist for advanced wiring:
from nlp2sql import DatabaseType, ProviderConfig, create_and_initialize_service
service = await create_and_initialize_service(
database_url="postgresql://testuser:testpass@localhost:5432/testdb",
provider_config=ProviderConfig(provider="openai", api_key="sk-..."),
database_type=DatabaseType.POSTGRES,
)
result = await service.generate_sql(
"Count active users by region",
database_type=DatabaseType.POSTGRES,
)
print(result["sql"])
Development
git clone https://github.com/luiscarbonel1991/nlp2sql.git
cd nlp2sql
uv sync
# Start the local public e-commerce database
cd docker && docker compose up -d postgres
# Integration tests without llm
cd ..
uv run pytest -m "integration and not llm"
# Optional llm integration tests
uv run pytest -m "integration and llm"
MCP Server
nlp2sql includes a Model Context Protocol server for assistant integration.
{
"mcpServers": {
"nlp2sql": {
"command": "python",
"args": ["/path/to/nlp2sql/mcp_server/server.py"],
"env": {
"OPENAI_API_KEY": "${OPENAI_API_KEY}",
"NLP2SQL_DEFAULT_DB_URL": "postgresql://testuser:testpass@localhost:5432/testdb"
}
}
}
}
See mcp_server/README.md for details.
Contributing
See CONTRIBUTING.md.
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlp2sql-0.2.0rc13.tar.gz.
File metadata
- Download URL: nlp2sql-0.2.0rc13.tar.gz
- Upload date:
- Size: 605.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee8db97a79fdbcd7a8b11c3e702434b85bbdf5be5736b3e84e55ee944fbd7c67
|
|
| MD5 |
85dccc2ec2b74bde1607cd535604334a
|
|
| BLAKE2b-256 |
ae61ceefa12084be7911f61ce93190e6aabeb72f62124babf6c222e514556056
|
Provenance
The following attestation bundles were made for nlp2sql-0.2.0rc13.tar.gz:
Publisher:
publish-pypi.yml on luiscarbonel1991/nlp2sql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nlp2sql-0.2.0rc13.tar.gz -
Subject digest:
ee8db97a79fdbcd7a8b11c3e702434b85bbdf5be5736b3e84e55ee944fbd7c67 - Sigstore transparency entry: 1245461520
- Sigstore integration time:
-
Permalink:
luiscarbonel1991/nlp2sql@5590825c5ccfa909db3785b1a6f1e5f1c4612549 -
Branch / Tag:
refs/tags/v0.2.0rc13 - Owner: https://github.com/luiscarbonel1991
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5590825c5ccfa909db3785b1a6f1e5f1c4612549 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file nlp2sql-0.2.0rc13-py3-none-any.whl.
File metadata
- Download URL: nlp2sql-0.2.0rc13-py3-none-any.whl
- Upload date:
- Size: 128.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c7d515b479b6d54f9d29c1da7880a003544a3a997698d8bb9d3532e670af1fe
|
|
| MD5 |
a750300d75fc637ef91eeebd8dc49ebf
|
|
| BLAKE2b-256 |
c1b4de9b376156888b63c42c110869a16b42a9d4808590cc386765b0d9aea42e
|
Provenance
The following attestation bundles were made for nlp2sql-0.2.0rc13-py3-none-any.whl:
Publisher:
publish-pypi.yml on luiscarbonel1991/nlp2sql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nlp2sql-0.2.0rc13-py3-none-any.whl -
Subject digest:
8c7d515b479b6d54f9d29c1da7880a003544a3a997698d8bb9d3532e670af1fe - Sigstore transparency entry: 1245461522
- Sigstore integration time:
-
Permalink:
luiscarbonel1991/nlp2sql@5590825c5ccfa909db3785b1a6f1e5f1c4612549 -
Branch / Tag:
refs/tags/v0.2.0rc13 - Owner: https://github.com/luiscarbonel1991
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5590825c5ccfa909db3785b1a6f1e5f1c4612549 -
Trigger Event:
workflow_dispatch
-
Statement type: