Reasoning Interface for Text-to-Analytics (RITA) - Natural language SQL and NoSQL (MongoDB) query interface powered by LangChain and LLMs
Project description
Ask RITA (Reasoning Interface for Text-to-Analytics)
Ask what. Get answers. RITA turns a natural-language question into SQL, statistics, and insights โ no code required.
Go beyond simple text-to-SQL. Ask RITA is an LLM-powered analytics framework that generates queries, runs scipy-backed statistical tests, conducts CRISP-DM research workflows, classifies data, and visualizes results โ across SQL and NoSQL databases โ from a single natural-language question.
๐ IMPORTANT โ Read-Only Database Access Required
AskRITA generates and executes SQL/NoSQL queries against your database. LLM-generated queries are inherently unpredictable. To prevent inadvertent writes, deletes, or schema changes:
- Always connect with a read-only database user. Grant only
SELECT(SQL) orfind/aggregate(MongoDB) permissions. Never use credentials withINSERT,UPDATE,DELETE,DROP, or DDL privileges.- Do not rely on application-level safeguards alone. AskRITA includes prompt-injection detection and blocks known destructive patterns, but these are defence-in-depth measures โ not substitutes for proper database permissions.
- Store credentials in environment variables (
${DB_USER},${DB_PASSWORD}), never in config files. See Configuration Guide.The database user's granted permissions are the only reliable boundary between AskRITA and your data.
๐ What's New in v0.13.0
- ๐ง Research Agent โ Real Statistical Tests: scipy-powered hypothesis testing replaces LLM-generated statistics
- Auto-selects Pearson vs Spearman correlation based on Shapiro-Wilk normality test
- Tukey HSD post-hoc pairwise comparisons after significant ANOVA
- Bonferroni correction across multiple tests in a single research run
analyze_hypothesis_data()auto-routes to the correct test family based on column types
- โก Research Agent โ Parallel Evidence Execution: Evidence queries now execute concurrently via
ThreadPoolExecutorโ wall time โ max(query_times) instead of sum - ๐๏ธ Research Agent โ Architecture Separation: SQL Agent generates SQL only; Research Agent executes queries directly via
db_manager - ๐ Bug Fixes: Thread-safety for parallel queries, aggregated data detection, Bonferroni-aware confidence scoring, schema decorator recursion storm fix
Previous Release (v0.12.2):
- ๐ก๏ธ SQL Prompt Injection Prevention: Defence-in-depth protection against malicious inputs
- ๐ง SonarQube Fixes: S2737, S3776, S1481, S1135, S1871
๐ Four Powerful Workflows
๐ SQLAgentWorkflow - Natural Language to SQL
- ๐ฃ๏ธ Natural Language to SQL - Ask questions in plain English
- ๐ฌ Conversational Queries - Follow-up questions with context awareness
- ๐๏ธ Multi-Database Support - PostgreSQL, MySQL, SQLite, SQL Server, BigQuery, Snowflake, IBM Db2
- ๐ Smart Visualization - Automatic chart recommendations
- ๐ Error Recovery - Automatic SQL retry with error feedback
๐ NoSQLAgentWorkflow - Natural Language to MongoDB
- ๐ฃ๏ธ Natural Language to MongoDB - Ask questions, get aggregation pipelines
- ๐ MongoDB Support -
mongodb://andmongodb+srv://(Atlas) connections - ๐ก๏ธ Safety Validation - Blocks destructive operations, read-only analytics
- ๐ Full Feature Parity - PII detection, visualization, follow-up questions, Chain-of-Thoughts
๐ฌ ResearchAgent - CRISP-DM Data Science Research
- ๐ CRISP-DM Methodology - Complete 6-phase data science workflow
- ๐งช Hypothesis Testing - Automated research question formulation and testing
- ๐ Real Statistics - scipy-powered t-tests, ANOVA, correlation, chi-square (not LLM-generated!)
- ๐ Effect Sizes - Cohen's d, ฮทยฒ, Cramรฉr's V with automatic interpretation
- ๐ฏ Actionable Insights - Data-driven recommendations with confidence levels
๐ท๏ธ DataClassificationWorkflow - LLM-Powered Data Processing
- ๐ผ๏ธ Image Classification - AI extracts data directly from images (medical bills, invoices, documents)
- ๐ Excel/CSV Processing - Process large datasets with AI classification
- ๐ API-First Design - Perfect for microservices with dynamic field definitions per request
- ๐ง Multi-Tenant Support - Different schemas per customer/organization without server restarts
๐ Model Performance Comparison (BIRD Benchmark)
BIRD Mini-Dev text-to-SQL execution accuracy (EX) across 500 questions, with oracle knowledge (evidence) enabled.
| Model | Overall | Simple (148) | Moderate (250) | Challenging (102) |
|---|---|---|---|---|
| Gemini 2.5 Pro | 64.4% | 77.0% | 61.2% | 53.9% |
| Gemini 2.5 Flash | 60.6% | 76.3% | 53.6% | 54.9% |
| GPT-5.4 | 54.8% | 68.9% | 50.8% | 44.1% |
| GPT-5.4 Mini | 53.2% | 70.3% | 49.6% | 37.2% |
| GPT-5.4 Nano | 40.0% | 53.4% | 36.0% | 30.4% |
| Gemini 2.5 Flash-Lite | 39.4% | 56.1% | 33.2% | 30.4% |
Core Features
- ๐ค Multi-Cloud LLM Integration - OpenAI, Azure, Google Cloud Vertex AI, AWS Bedrock
- โ๏ธ Configurable Workflows - Enable/disable steps, customize prompts, enhanced security options
- ๐ Enterprise Security - Credential management, access controls, audit logging
- ๐ก๏ธ PII/PHI Detection - Automatic privacy protection with Microsoft Presidio analyzer
- ๐๏ธ Production Ready - Design pattern architecture, comprehensive logging, error handling, monitoring
- ๐ Advanced BigQuery - Cross-project dataset access, 3-step validation, configurable access patterns
- ๐ Token Management - Built-in token utilities for cost optimization and LLM efficiency
- ๐งช Extensive Testing - Full test suite with quality assurance tools (550+ tests passing)
- ๐ Type-Safe Integration - Exported Pydantic models for seamless downstream application integration
Quick Start
1. Install
pip install askrita
๐ More options: Installation Guide โ pip, Poetry, from-source, development setup
2. Configure
export OPENAI_API_KEY="your-api-key-here"
cp example-configs/query-openai.yaml my-config.yaml
โ๏ธ Full reference: Configuration Guide
3. Use
from askrita import SQLAgentWorkflow, ConfigManager
config = ConfigManager("my-config.yaml")
workflow = SQLAgentWorkflow(config)
result = workflow.query("What are the top 10 customers by revenue?")
print(result['answer'])
NoSQL (MongoDB)
from askrita import NoSQLAgentWorkflow, ConfigManager
config = ConfigManager("mongodb-config.yaml")
workflow = NoSQLAgentWorkflow(config)
result = workflow.query("How many orders were placed last month?")
print(result.answer)
Research Agent - CRISP-DM Data Science
from askrita import ConfigManager
from askrita.research import ResearchAgent
config = ConfigManager("my-config.yaml")
research = ResearchAgent(config)
result = research.test_hypothesis(
research_question="How does customer satisfaction differ across business lines?",
hypothesis="Medicare members have higher NPS scores than Commercial members"
)
print(f"Conclusion: {result['conclusion']}") # SUPPORTED, REFUTED, or INCONCLUSIVE
print(f"P-value: {result['key_metrics'].get('p_value')}") # Real scipy computation
๐ All examples: Usage Examples & API Reference โ conversational queries, data classification, exports, CLI, result format
โ ๏ธ Important: Configuration file with LLM provider settings and prompts is always required. API keys are read from environment variables.
Type-Safe Integration
from askrita import (
SQLAgentWorkflow, ConfigManager,
UniversalChartData, ChartDataset, DataPoint, WorkflowState
)
result: WorkflowState = workflow.query("Show me sales by region")
chart = UniversalChartData(**result['chart_data'])
Supported Platforms
Databases: PostgreSQL, MySQL, SQLite, SQL Server, BigQuery, Snowflake, IBM DB2, MongoDB
LLM Providers: OpenAI, Azure OpenAI, Google Cloud Vertex AI, AWS Bedrock
๐ Connection strings, auth details, config templates: Supported Platforms
Configuration
Required Components
| Component | Required | Description |
|---|---|---|
| ๐ LLM | โ Yes | Provider, model + env variables |
| ๐๏ธ Database | โ Yes | Connection string |
| ๐ Prompts | โ Yes | All 5 workflow prompts |
Quick Setup
export OPENAI_API_KEY="your-api-key-here"
cp example-configs/query-openai.yaml my-config.yaml
Configuration Templates
example-configs/query-openai.yaml # OpenAI + PostgreSQL
example-configs/query-azure-openai.yaml # Azure OpenAI
example-configs/query-snowflake.yaml # Snowflake database
example-configs/query-mongodb.yaml # MongoDB (NoSQL)
example-configs/example-zscaler-config.yaml # Corporate proxy setup
example-configs/data-classification-*.yaml # Data processing workflows
๐ Complete reference: Configuration Guide
Corporate Proxy & SSL
llm:
ca_bundle_path: "./credentials/zscaler-ca.pem"
๐ Full guide: CA Bundle Setup
MCP Server (for AI Assistants)
{
"mcpServers": {
"askrita": {
"command": "askrita",
"args": ["mcp", "--config", "/path/to/your/config.yaml"]
}
}
}
๐ Setup guide: Claude Desktop Setup
Development
Setup
git clone https://github.com/cvs-health/askRITA.git
cd askRITA
pip install poetry && poetry install
Quality Checks
poetry run pytest # Tests
poetry run black askrita/ # Format
poetry run flake8 askrita/ # Lint
poetry run mypy askrita/ # Type check
๐ Documentation
| Guide | Description |
|---|---|
| Installation | pip, Poetry, from-source, development setup |
| Configuration | YAML configuration โ database, LLM, prompts, PII, security |
| Usage Examples & API | Code examples, CLI, API reference, result format |
| Supported Platforms | Databases, LLM providers, connection strings, auth |
| SQL Workflow | Core text-to-SQL workflow โ query, chat, export, schema |
| Conversational SQL | Multi-turn chat mode, follow-up questions, clarification |
| Research Workflow | CRISP-DM hypothesis testing with scipy statistics |
| Data Classification | LLM-powered classification of CSV/Excel with dynamic schemas |
| NoSQL Workflow | MongoDB workflow setup and usage |
| Export (PPTX, PDF, Excel) | Export query results to branded reports and spreadsheets |
| Security | SQL safety, prompt injection detection, PII/PHI scanning |
| Schema Enrichment | Schema caching, descriptions, decorators, cross-project access |
| Chain of Thoughts | Step-by-step reasoning traces and progress callbacks |
| CLI Reference | askrita command โ query, interactive, test, mcp |
| MCP Server | Model Context Protocol server setup |
| Claude Desktop Setup | MCP integration with Claude Desktop |
| CA Bundle Setup | Certificate authority configuration |
| Benchmark Results | BIRD Mini-Dev model comparison and per-model analysis |
| Chart Types | Google Charts โ 13 chart types, React & Angular guides |
| Contributing | Dev setup, branching, pull requests, code quality |
| Versioning & Releases | Semantic versioning, version bump scripts, release checklist |
| Docker Testing | Cross-version compatibility testing in isolated environments |
| Changelog | Version history and updates |
๐ Complete index: Documentation Site
License
Apache License 2.0 - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askrita-0.13.13.tar.gz.
File metadata
- Download URL: askrita-0.13.13.tar.gz
- Upload date:
- Size: 212.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc10109338b2903c0c4c84c03a71de9f87b31822d6f8e51103652042f43957c6
|
|
| MD5 |
bd7119da0248cac321d949db419069ee
|
|
| BLAKE2b-256 |
304eda5987be7e566921eb0563810388f45061a3e99147f55fc9d45da11e7212
|
Provenance
The following attestation bundles were made for askrita-0.13.13.tar.gz:
Publisher:
publish.yaml on cvs-health/AskRITA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
askrita-0.13.13.tar.gz -
Subject digest:
cc10109338b2903c0c4c84c03a71de9f87b31822d6f8e51103652042f43957c6 - Sigstore transparency entry: 1309155875
- Sigstore integration time:
-
Permalink:
cvs-health/AskRITA@69f613d08dc945034011e40b885034dae0128cf9 -
Branch / Tag:
refs/tags/v0.13.13 - Owner: https://github.com/cvs-health
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@69f613d08dc945034011e40b885034dae0128cf9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file askrita-0.13.13-py3-none-any.whl.
File metadata
- Download URL: askrita-0.13.13-py3-none-any.whl
- Upload date:
- Size: 249.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c79a4baef362bd6c1aadd4612055a635b258958a40fe59b890c5a66ba6d14ed
|
|
| MD5 |
4aa8c483f4e13464041d13d7c095473c
|
|
| BLAKE2b-256 |
1efdd2be55bf95c584a722788ce6bb9cd073dfe704e95c20bb5976aab8f6f7fa
|
Provenance
The following attestation bundles were made for askrita-0.13.13-py3-none-any.whl:
Publisher:
publish.yaml on cvs-health/AskRITA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
askrita-0.13.13-py3-none-any.whl -
Subject digest:
2c79a4baef362bd6c1aadd4612055a635b258958a40fe59b890c5a66ba6d14ed - Sigstore transparency entry: 1309156017
- Sigstore integration time:
-
Permalink:
cvs-health/AskRITA@69f613d08dc945034011e40b885034dae0128cf9 -
Branch / Tag:
refs/tags/v0.13.13 - Owner: https://github.com/cvs-health
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@69f613d08dc945034011e40b885034dae0128cf9 -
Trigger Event:
push
-
Statement type: