Enterprise-ready Natural Language to SQL converter with multi-provider support. Built for production scale (1000+ tables) with Clean Architecture.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

luiscarbonel1991

These details have not been verified by PyPI

Project links

Documentation

Project description

nlp2sql logo

nlp2sql

Enterprise-ready Natural Language to SQL converter with multi-provider support

Convert natural language queries to optimized SQL using multiple AI providers. Built with Clean Architecture principles for enterprise-scale applications handling 1000+ table databases.

Features

Multiple AI Providers: OpenAI, Anthropic Claude, Google Gemini - no vendor lock-in
Database Support: PostgreSQL, Amazon Redshift
Large Schema Handling: Vector embeddings and intelligent filtering for 1000+ tables
Smart Caching: Query and schema embedding caching for improved performance
Async Support: Full async/await support
Clean Architecture: Ports & Adapters pattern for maintainability

Documentation

Document	Description
Architecture	Component diagram and data flow
API Reference	Python API and CLI command reference
Configuration	Environment variables and schema filters
Enterprise Guide	Large-scale deployment and migration
Redshift Support	Amazon Redshift setup and examples
Contributing	Contribution guidelines

Installation

# With UV (recommended)
uv add nlp2sql

# With pip
pip install nlp2sql

# With specific providers
pip install nlp2sql[anthropic,gemini]
pip install nlp2sql[all-providers]

# With embeddings
pip install nlp2sql[embeddings-local]   # Local embeddings (free)
pip install nlp2sql[embeddings-openai]  # OpenAI embeddings

Quick Start

1. Set Environment Variables

# At least one AI provider key required
export OPENAI_API_KEY="your-openai-key"
# export ANTHROPIC_API_KEY="your-anthropic-key"
# export GOOGLE_API_KEY="your-google-key"

2. One-Line Usage

import asyncio
import os
from nlp2sql import generate_sql_from_db

async def main():
    result = await generate_sql_from_db(
        database_url="postgresql://user:pass@localhost:5432/mydb",
        question="Show me all active users",
        ai_provider="openai",
        api_key=os.getenv("OPENAI_API_KEY")
    )
    print(result['sql'])
    print(f"Confidence: {result['confidence']}")

asyncio.run(main())

3. Pre-Initialized Service (Better Performance)

from nlp2sql import create_and_initialize_service

async def main():
    # Initialize once
    service = await create_and_initialize_service(
        database_url="postgresql://user:pass@localhost:5432/mydb",
        ai_provider="openai",
        api_key=os.getenv("OPENAI_API_KEY")
    )

    # Use multiple times
    result1 = await service.generate_sql("Count total users")
    result2 = await service.generate_sql("Show recent orders")

4. Large Database with Schema Filtering

from nlp2sql import create_and_initialize_service

service = await create_and_initialize_service(
    database_url="postgresql://localhost/enterprise",
    ai_provider="anthropic",  # Best for large schemas (200K context)
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    schema_filters={
        "include_schemas": ["sales", "finance"],
        "exclude_system_tables": True
    }
)

result = await service.generate_sql("Show revenue by month")

5. CLI Usage

# Generate SQL
nlp2sql query \
  --database-url postgresql://user:pass@localhost:5432/mydb \
  --question "Show all active users" \
  --explain

# Inspect schema
nlp2sql inspect --database-url postgresql://localhost/mydb

# Benchmark providers
nlp2sql benchmark --database-url postgresql://localhost/mydb

How It Works

Question ──► Cache check ──► Schema retrieval ──► Relevance filtering ──► Context building ──► AI generation ──► Validation
                                    │                     │                      │
                              SchemaRepository    FAISS + TF-IDF hybrid   Reuses precomputed
                              (+ disk cache)      + batch scoring          relevance scores

Schema retrieval -- Fetches tables from database via SchemaRepository (with disk cache for Redshift)
Relevance filtering -- FAISS dense search + TF-IDF sparse search (50/50 hybrid) finds candidate tables; batch scoring refines with precomputed embeddings
Context building -- Builds optimized schema context within token limits, reusing scores from step 2 (zero additional embedding calls)
SQL generation -- AI provider (OpenAI, Anthropic, or Gemini) generates SQL from question + schema context
Validation -- SQL syntax and safety checks before returning results

See Architecture for the detailed flow with method references and design decisions.

Provider Comparison

Provider	Context Size	Best For
OpenAI GPT-4	128K	Complex reasoning
Anthropic Claude	200K	Large schemas
Google Gemini	1M	High volume, cost efficiency

See Configuration for detailed provider setup.

Architecture

Clean Architecture (Ports & Adapters) with three layers: core entities, port interfaces, and adapter implementations. The schema management layer uses FAISS + TF-IDF hybrid search for relevance filtering at scale.

nlp2sql/
├── core/           # Business entities (pure Python, no dependencies)
├── ports/          # Interfaces (AIProviderPort, SchemaRepositoryPort, EmbeddingProviderPort)
├── adapters/       # Implementations (OpenAI, Anthropic, Gemini, PostgreSQL, Redshift)
├── services/       # Orchestration (QueryGenerationService)
├── schema/         # Schema management (SchemaManager, SchemaAnalyzer, SchemaEmbeddingManager)
├── config/         # Pydantic Settings configuration
└── exceptions/     # Custom exception hierarchy

See Architecture for the full component diagram, data flow, and design decisions.

Development

# Clone and install
git clone https://github.com/luiscarbonel1991/nlp2sql.git
cd nlp2sql
uv sync

# Start test databases
cd docker && docker-compose up -d

# Run tests
uv run pytest

# Code quality
uv run ruff format .
uv run ruff check .
uv run mypy src/

MCP Server

nlp2sql includes a Model Context Protocol server for AI assistant integration.

{
  "mcpServers": {
    "nlp2sql": {
      "command": "python",
      "args": ["/path/to/nlp2sql/mcp_server/server.py"],
      "env": {
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "NLP2SQL_DEFAULT_DB_URL": "postgresql://user:pass@localhost:5432/mydb"
      }
    }
  }
}

Tools: ask_database, explore_schema, run_sql, list_databases, explain_sql

See mcp_server/README.md for complete setup.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE.

Author

Luis Carbonel - @luiscarbonel1991

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

luiscarbonel1991

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.2.0rc13 pre-release

Apr 7, 2026

0.2.0rc12 pre-release

Apr 2, 2026

0.2.0rc11 pre-release

Apr 2, 2026

0.2.0rc10 pre-release

Mar 24, 2026

0.2.0rc9 pre-release

Mar 23, 2026

0.2.0rc8 pre-release

Mar 23, 2026

0.2.0rc7 pre-release

Mar 19, 2026

This version

0.2.0rc6 pre-release

Mar 13, 2026

0.2.0rc5 pre-release

Jan 20, 2026

0.2.0rc4 pre-release

Jan 16, 2026

0.2.0rc2 pre-release

Aug 6, 2025

0.2.0rc1 pre-release

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp2sql-0.2.0rc6.tar.gz (561.6 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlp2sql-0.2.0rc6-py3-none-any.whl (82.0 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file nlp2sql-0.2.0rc6.tar.gz.

File metadata

Download URL: nlp2sql-0.2.0rc6.tar.gz
Upload date: Mar 13, 2026
Size: 561.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlp2sql-0.2.0rc6.tar.gz
Algorithm	Hash digest
SHA256	`97b95a884a9a30a3a88e5208d399429a8fa47ae5c432f167acf6f2a0ada2504c`
MD5	`def1a4aff0b27f4b07f45b7ae0c44e6c`
BLAKE2b-256	`6f2777452e4d8173decf68a2c20b69cff0d9664fbfa5d050d736f43bac458392`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc6.tar.gz:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlp2sql-0.2.0rc6.tar.gz
- Subject digest: 97b95a884a9a30a3a88e5208d399429a8fa47ae5c432f167acf6f2a0ada2504c
- Sigstore transparency entry: 1097182808
- Sigstore integration time: Mar 13, 2026
Source repository:
- Permalink: luiscarbonel1991/nlp2sql@7be24f29ae9a6f372b967d2cd0639ba112dd3438
- Branch / Tag: refs/tags/v0.2.0rc6
- Owner: https://github.com/luiscarbonel1991
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@7be24f29ae9a6f372b967d2cd0639ba112dd3438
- Trigger Event: workflow_dispatch

File details

Details for the file nlp2sql-0.2.0rc6-py3-none-any.whl.

File metadata

Download URL: nlp2sql-0.2.0rc6-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 82.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlp2sql-0.2.0rc6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`771cd26a932aaf505bf9fd25b3fb600a099bf2f16d2d84d70cc2849d89550e9c`
MD5	`18def4372e7ce86ceb6e08a820dff73d`
BLAKE2b-256	`d34cf1994def679048d4ea9ac851614d4b928f002dbd09515ed1b741d54227aa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc6-py3-none-any.whl:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlp2sql-0.2.0rc6-py3-none-any.whl
- Subject digest: 771cd26a932aaf505bf9fd25b3fb600a099bf2f16d2d84d70cc2849d89550e9c
- Sigstore transparency entry: 1097182839
- Sigstore integration time: Mar 13, 2026
Source repository:
- Permalink: luiscarbonel1991/nlp2sql@7be24f29ae9a6f372b967d2cd0639ba112dd3438
- Branch / Tag: refs/tags/v0.2.0rc6
- Owner: https://github.com/luiscarbonel1991
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@7be24f29ae9a6f372b967d2cd0639ba112dd3438
- Trigger Event: workflow_dispatch

nlp2sql 0.2.0rc6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nlp2sql

Features

Documentation

Installation

Quick Start

1. Set Environment Variables

2. One-Line Usage

3. Pre-Initialized Service (Better Performance)

4. Large Database with Schema Filtering

5. CLI Usage

How It Works

Provider Comparison

Architecture

Development

MCP Server

Contributing

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance