Lightweight Natural Language Query validator — keep your LLM assistant on-topic

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

nlq-validator

A lightweight Natural Language Query (NLQ) validator that keeps your LLM assistant on-topic. Train it on a handful of example questions, and it will accept in-domain queries while rejecting off-topic ones — no server, no API key required for the core functionality.

Features

TF-IDF scoring out of the box — no model downloads needed
Semantic embeddings via sentence-transformers for paraphrase-aware matching
Threshold calibration — find the F1-optimal cutoff for your domain
Incremental retraining — add examples without rebuilding from scratch
LLM-powered question generation — auto-generate training data from your system prompt (Claude, ChatGPT, Gemini, Mistral, Grok, Perplexity)
Async support for all LLM integrations
Zero runtime dependencies beyond scikit-learn for the core validator

Installation

pip install nlq-validator

Optional extras

pip install 'nlq-validator[embeddings]'   # sentence-transformers for semantic matching
pip install 'nlq-validator[anthropic]'    # Claude integration
pip install 'nlq-validator[openai]'       # ChatGPT, Grok, Perplexity integrations
pip install 'nlq-validator[gemini]'       # Google Gemini integration
pip install 'nlq-validator[mistral]'      # Mistral integration
pip install 'nlq-validator[all-llm]'      # All LLM integrations

Quick start

from nlq_validator import NLQValidator

SYSTEM_PROMPT = (
    "You are a SQL assistant. You help users write queries, "
    "understand JOINs, indexes, and query optimization."
)

# Train from a plain-text file (one question per line)
v = NLQValidator.from_training_file("questions.txt", SYSTEM_PROMPT)

result = v.validate("How do I write a SELECT statement?")
print(result.is_valid)   # True

result = v.validate("What is my horoscope today?")
print(result.is_valid)   # False
print(result.errors)     # ['Query appears off-topic (score=0.000, threshold=0.250)']

Training data format

Supported file formats: .txt (one question per line), .csv (first column), .json (list of strings or list of {"text": "..."} objects).

# questions.txt
How do I write a SELECT statement?
What is a SQL JOIN?
How do I filter rows with WHERE clause?
What is the difference between INNER JOIN and LEFT JOIN?
...

Threshold calibration

The default threshold of 0.25 is a conservative starting point. Use calibrate() to find the optimal value for your domain:

in_domain = ["How do I use GROUP BY?", "What is a primary key?", ...]
off_domain = ["How do I bake bread?", "What is my horoscope?", ...]

result = v.calibrate(in_domain, off_domain)
result.summary()          # prints precision/recall/F1 table
v.apply_calibration(result)  # applies suggested threshold

Incremental retraining

v.retrain(["How do I write a CTE?", "What is a window function?"])
# or from a file:
v.retrain_from_file("more_questions.txt")

Semantic embeddings

For queries that use different words but mean the same thing:

v = NLQValidator.from_training_file(
    "questions.txt",
    SYSTEM_PROMPT,
    embedding_model="all-MiniLM-L6-v2",   # requires nlq-validator[embeddings]
)

LLM-powered question generation

Generate training data automatically from your system prompt:

from nlq_validator.integrations.claude import ClaudeIntegration

llm = ClaudeIntegration()   # reads ANTHROPIC_API_KEY env var
v = NLQValidator.from_llm(llm, SYSTEM_PROMPT, count=50)

Async variant:

v = await NLQValidator.from_llm_async(llm, SYSTEM_PROMPT, count=50)

Supported LLM providers

Provider	Extra	Class	Env vars
Claude	`[anthropic]`	`ClaudeIntegration`	`ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`
ChatGPT	`[openai]`	`ChatGPTIntegration`	`OPENAI_API_KEY`, `OPENAI_MODEL`
Gemini	`[gemini]`	`GeminiIntegration`	`GEMINI_API_KEY`, `GEMINI_MODEL`
Mistral	`[mistral]`	`MistralIntegration`	`MISTRAL_API_KEY`, `MISTRAL_MODEL`
Grok	`[openai]`	`GrokIntegration`	`XAI_API_KEY`, `XAI_MODEL`
Perplexity	`[openai]`	`PerplexityIntegration`	`PERPLEXITY_API_KEY`, `PERPLEXITY_MODEL`

Save and load

v.save("my_model.pkl")
v2 = NLQValidator.load("my_model.pkl")

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

balajeekalyan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlq_validator-0.1.0.tar.gz (15.6 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlq_validator-0.1.0-py3-none-any.whl (15.7 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file nlq_validator-0.1.0.tar.gz.

File metadata

Download URL: nlq_validator-0.1.0.tar.gz
Upload date: May 14, 2026
Size: 15.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlq_validator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`468a9fdde3956fae09ef8dfd693c2ff5d57dc9e9cba20eb5f0b8d794cfe0c828`
MD5	`d937683d1bea319e6bdd1ce239345a28`
BLAKE2b-256	`8195107978ce9a49487b9d3fc6f8513397bffff13e5481718cbd026edaab55e0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlq_validator-0.1.0.tar.gz:

Publisher: publish.yml on balajeekalyan/nlq-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlq_validator-0.1.0.tar.gz
- Subject digest: 468a9fdde3956fae09ef8dfd693c2ff5d57dc9e9cba20eb5f0b8d794cfe0c828
- Sigstore transparency entry: 1529439532
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: balajeekalyan/nlq-validator@8555e1ff36692c5d731840f310b0ad0c9f60bfe9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/balajeekalyan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8555e1ff36692c5d731840f310b0ad0c9f60bfe9
- Trigger Event: push

File details

Details for the file nlq_validator-0.1.0-py3-none-any.whl.

File metadata

Download URL: nlq_validator-0.1.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlq_validator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27e2b622219c06c38408e6bc8bae382de0d20a4ecee6ce709213b5ed3d8b7fef`
MD5	`4f60f137ee4c6c9c11005fee8e24865d`
BLAKE2b-256	`b2aa55625e1f82e4961ad37c82c7faff5ce6f0e618cb7a8d51dc4830c85326fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlq_validator-0.1.0-py3-none-any.whl:

Publisher: publish.yml on balajeekalyan/nlq-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlq_validator-0.1.0-py3-none-any.whl
- Subject digest: 27e2b622219c06c38408e6bc8bae382de0d20a4ecee6ce709213b5ed3d8b7fef
- Sigstore transparency entry: 1529439742
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: balajeekalyan/nlq-validator@8555e1ff36692c5d731840f310b0ad0c9f60bfe9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/balajeekalyan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8555e1ff36692c5d731840f310b0ad0c9f60bfe9
- Trigger Event: push

nlq-validator 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

nlq-validator

Features

Installation

Optional extras

Quick start

Training data format

Threshold calibration

Incremental retraining

Semantic embeddings

LLM-powered question generation

Supported LLM providers

Save and load

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance