Skip to main content

Reasoning Interface for Text-to-Analytics (RITA) - Natural language SQL and NoSQL (MongoDB) query interface powered by LangChain and LLMs

Project description

Ask RITA (Reasoning Interface for Text-to-Analytics)

Ask what. Get answers. RITA turns a natural-language question into SQL, statistics, and insights โ€” no code required.

Go beyond simple text-to-SQL. Ask RITA is an LLM-powered analytics framework that generates queries, runs scipy-backed statistical tests, conducts CRISP-DM research workflows, classifies data, and visualizes results โ€” across SQL and NoSQL databases โ€” from a single natural-language question.

Python 3.11+ License: Apache 2.0

๐Ÿ”’ IMPORTANT โ€” Read-Only Database Access Required

AskRITA generates and executes SQL/NoSQL queries against your database. LLM-generated queries are inherently unpredictable. To prevent inadvertent writes, deletes, or schema changes:

  1. Always connect with a read-only database user. Grant only SELECT (SQL) or find/aggregate (MongoDB) permissions. Never use credentials with INSERT, UPDATE, DELETE, DROP, or DDL privileges.
  2. Do not rely on application-level safeguards alone. AskRITA includes prompt-injection detection and blocks known destructive patterns, but these are defence-in-depth measures โ€” not substitutes for proper database permissions.
  3. Store credentials in environment variables (${DB_USER}, ${DB_PASSWORD}), never in config files. See Configuration Guide.

The database user's granted permissions are the only reliable boundary between AskRITA and your data.

๐Ÿ†• What's New in v0.13.0

  • ๐Ÿง  Research Agent โ€” Real Statistical Tests: scipy-powered hypothesis testing replaces LLM-generated statistics
    • Auto-selects Pearson vs Spearman correlation based on Shapiro-Wilk normality test
    • Tukey HSD post-hoc pairwise comparisons after significant ANOVA
    • Bonferroni correction across multiple tests in a single research run
    • analyze_hypothesis_data() auto-routes to the correct test family based on column types
  • โšก Research Agent โ€” Parallel Evidence Execution: Evidence queries now execute concurrently via ThreadPoolExecutor โ€” wall time โ‰ˆ max(query_times) instead of sum
  • ๐Ÿ—๏ธ Research Agent โ€” Architecture Separation: SQL Agent generates SQL only; Research Agent executes queries directly via db_manager
  • ๐Ÿ› Bug Fixes: Thread-safety for parallel queries, aggregated data detection, Bonferroni-aware confidence scoring, schema decorator recursion storm fix

Previous Release (v0.12.2):

  • ๐Ÿ›ก๏ธ SQL Prompt Injection Prevention: Defence-in-depth protection against malicious inputs
  • ๐Ÿ”ง SonarQube Fixes: S2737, S3776, S1481, S1135, S1871

๐Ÿš€ Four Powerful Workflows

๐Ÿ“Š SQLAgentWorkflow - Natural Language to SQL

  • ๐Ÿ—ฃ๏ธ Natural Language to SQL - Ask questions in plain English
  • ๐Ÿ’ฌ Conversational Queries - Follow-up questions with context awareness
  • ๐Ÿ—„๏ธ Multi-Database Support - PostgreSQL, MySQL, SQLite, SQL Server, BigQuery, Snowflake, IBM Db2
  • ๐Ÿ“Š Smart Visualization - Automatic chart recommendations
  • ๐Ÿ”„ Error Recovery - Automatic SQL retry with error feedback

๐Ÿƒ NoSQLAgentWorkflow - Natural Language to MongoDB

  • ๐Ÿ—ฃ๏ธ Natural Language to MongoDB - Ask questions, get aggregation pipelines
  • ๐Ÿƒ MongoDB Support - mongodb:// and mongodb+srv:// (Atlas) connections
  • ๐Ÿ›ก๏ธ Safety Validation - Blocks destructive operations, read-only analytics
  • ๐Ÿ”„ Full Feature Parity - PII detection, visualization, follow-up questions, Chain-of-Thoughts

๐Ÿ”ฌ ResearchAgent - CRISP-DM Data Science Research

  • ๐Ÿ“‹ CRISP-DM Methodology - Complete 6-phase data science workflow
  • ๐Ÿงช Hypothesis Testing - Automated research question formulation and testing
  • ๐Ÿ“Š Real Statistics - scipy-powered t-tests, ANOVA, correlation, chi-square (not LLM-generated!)
  • ๐Ÿ“ˆ Effect Sizes - Cohen's d, ฮทยฒ, Cramรฉr's V with automatic interpretation
  • ๐ŸŽฏ Actionable Insights - Data-driven recommendations with confidence levels

๐Ÿท๏ธ DataClassificationWorkflow - LLM-Powered Data Processing

  • ๐Ÿ–ผ๏ธ Image Classification - AI extracts data directly from images (medical bills, invoices, documents)
  • ๐Ÿ“„ Excel/CSV Processing - Process large datasets with AI classification
  • ๐Ÿš€ API-First Design - Perfect for microservices with dynamic field definitions per request
  • ๐Ÿง  Multi-Tenant Support - Different schemas per customer/organization without server restarts

๐Ÿ“Š Model Performance Comparison (BIRD Benchmark)

BIRD Mini-Dev text-to-SQL execution accuracy (EX) across 500 questions, with oracle knowledge (evidence) enabled.

BIRD Benchmark Results

Model Overall Simple (148) Moderate (250) Challenging (102)
Gemini 2.5 Pro 64.4% 77.0% 61.2% 53.9%
Gemini 2.5 Flash 60.6% 76.3% 53.6% 54.9%
GPT-5.4 54.8% 68.9% 50.8% 44.1%
GPT-5.4 Mini 53.2% 70.3% 49.6% 37.2%
GPT-5.4 Nano 40.0% 53.4% 36.0% 30.4%
Gemini 2.5 Flash-Lite 39.4% 56.1% 33.2% 30.4%

Core Features

  • ๐Ÿค– Multi-Cloud LLM Integration - OpenAI, Azure, Google Cloud Vertex AI, AWS Bedrock
  • โš™๏ธ Configurable Workflows - Enable/disable steps, customize prompts, enhanced security options
  • ๐Ÿ”’ Enterprise Security - Credential management, access controls, audit logging
  • ๐Ÿ›ก๏ธ PII/PHI Detection - Automatic privacy protection with Microsoft Presidio analyzer
  • ๐Ÿ—๏ธ Production Ready - Design pattern architecture, comprehensive logging, error handling, monitoring
  • ๐ŸŒ Advanced BigQuery - Cross-project dataset access, 3-step validation, configurable access patterns
  • ๐Ÿ“Š Token Management - Built-in token utilities for cost optimization and LLM efficiency
  • ๐Ÿงช Extensive Testing - Full test suite with quality assurance tools (550+ tests passing)
  • ๐Ÿ”Œ Type-Safe Integration - Exported Pydantic models for seamless downstream application integration

Quick Start

1. Install

pip install askrita

๐Ÿ“‹ More options: Installation Guide โ€” pip, Poetry, from-source, development setup

2. Configure

export OPENAI_API_KEY="your-api-key-here"
cp example-configs/query-openai.yaml my-config.yaml

โš™๏ธ Full reference: Configuration Guide

3. Use

from askrita import SQLAgentWorkflow, ConfigManager

config = ConfigManager("my-config.yaml")
workflow = SQLAgentWorkflow(config)
result = workflow.query("What are the top 10 customers by revenue?")
print(result['answer'])

NoSQL (MongoDB)

from askrita import NoSQLAgentWorkflow, ConfigManager

config = ConfigManager("mongodb-config.yaml")
workflow = NoSQLAgentWorkflow(config)
result = workflow.query("How many orders were placed last month?")
print(result.answer)

Research Agent - CRISP-DM Data Science

from askrita import ConfigManager
from askrita.research import ResearchAgent

config = ConfigManager("my-config.yaml")
research = ResearchAgent(config)

result = research.test_hypothesis(
    research_question="How does customer satisfaction differ across business lines?",
    hypothesis="Medicare members have higher NPS scores than Commercial members"
)

print(f"Conclusion: {result['conclusion']}")  # SUPPORTED, REFUTED, or INCONCLUSIVE
print(f"P-value: {result['key_metrics'].get('p_value')}")  # Real scipy computation

๐Ÿ“– All examples: Usage Examples & API Reference โ€” conversational queries, data classification, exports, CLI, result format

โš ๏ธ Important: Configuration file with LLM provider settings and prompts is always required. API keys are read from environment variables.

Type-Safe Integration

from askrita import (
    SQLAgentWorkflow, ConfigManager,
    UniversalChartData, ChartDataset, DataPoint, WorkflowState
)

result: WorkflowState = workflow.query("Show me sales by region")
chart = UniversalChartData(**result['chart_data'])

Supported Platforms

Databases: PostgreSQL, MySQL, SQLite, SQL Server, BigQuery, Snowflake, IBM DB2, MongoDB

LLM Providers: OpenAI, Azure OpenAI, Google Cloud Vertex AI, AWS Bedrock

๐Ÿ“‹ Connection strings, auth details, config templates: Supported Platforms

Configuration

Required Components

Component Required Description
๐Ÿ”‘ LLM โœ… Yes Provider, model + env variables
๐Ÿ—„๏ธ Database โœ… Yes Connection string
๐Ÿ“ Prompts โœ… Yes All 5 workflow prompts

Quick Setup

export OPENAI_API_KEY="your-api-key-here"
cp example-configs/query-openai.yaml my-config.yaml

Configuration Templates

example-configs/query-openai.yaml           # OpenAI + PostgreSQL
example-configs/query-azure-openai.yaml     # Azure OpenAI
example-configs/query-snowflake.yaml        # Snowflake database
example-configs/query-mongodb.yaml          # MongoDB (NoSQL)
example-configs/example-zscaler-config.yaml # Corporate proxy setup
example-configs/data-classification-*.yaml  # Data processing workflows

๐Ÿ“š Complete reference: Configuration Guide

Corporate Proxy & SSL

llm:
  ca_bundle_path: "./credentials/zscaler-ca.pem"

๐Ÿ“š Full guide: CA Bundle Setup

MCP Server (for AI Assistants)

{
  "mcpServers": {
    "askrita": {
      "command": "askrita",
      "args": ["mcp", "--config", "/path/to/your/config.yaml"]
    }
  }
}

๐Ÿ“– Setup guide: Claude Desktop Setup

Development

Setup

git clone https://github.com/cvs-health/askRITA.git
cd askRITA
pip install poetry && poetry install

Quality Checks

poetry run pytest                    # Tests
poetry run black askrita/         # Format  
poetry run flake8 askrita/        # Lint
poetry run mypy askrita/          # Type check

๐Ÿ“š Documentation

Guide Description
Installation pip, Poetry, from-source, development setup
Configuration YAML configuration โ€” database, LLM, prompts, PII, security
Usage Examples & API Code examples, CLI, API reference, result format
Supported Platforms Databases, LLM providers, connection strings, auth
SQL Workflow Core text-to-SQL workflow โ€” query, chat, export, schema
Conversational SQL Multi-turn chat mode, follow-up questions, clarification
Research Workflow CRISP-DM hypothesis testing with scipy statistics
Data Classification LLM-powered classification of CSV/Excel with dynamic schemas
NoSQL Workflow MongoDB workflow setup and usage
Export (PPTX, PDF, Excel) Export query results to branded reports and spreadsheets
Security SQL safety, prompt injection detection, PII/PHI scanning
Schema Enrichment Schema caching, descriptions, decorators, cross-project access
Chain of Thoughts Step-by-step reasoning traces and progress callbacks
CLI Reference askrita command โ€” query, interactive, test, mcp
MCP Server Model Context Protocol server setup
Claude Desktop Setup MCP integration with Claude Desktop
CA Bundle Setup Certificate authority configuration
Benchmark Results BIRD Mini-Dev model comparison and per-model analysis
Chart Types Google Charts โ€” 13 chart types, React & Angular guides
Contributing Dev setup, branching, pull requests, code quality
Versioning & Releases Semantic versioning, version bump scripts, release checklist
Docker Testing Cross-version compatibility testing in isolated environments
Changelog Version history and updates

๐Ÿ“– Complete index: Documentation Site

License

Apache License 2.0 - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askrita-0.13.13.tar.gz (212.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

askrita-0.13.13-py3-none-any.whl (249.7 kB view details)

Uploaded Python 3

File details

Details for the file askrita-0.13.13.tar.gz.

File metadata

  • Download URL: askrita-0.13.13.tar.gz
  • Upload date:
  • Size: 212.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for askrita-0.13.13.tar.gz
Algorithm Hash digest
SHA256 cc10109338b2903c0c4c84c03a71de9f87b31822d6f8e51103652042f43957c6
MD5 bd7119da0248cac321d949db419069ee
BLAKE2b-256 304eda5987be7e566921eb0563810388f45061a3e99147f55fc9d45da11e7212

See more details on using hashes here.

Provenance

The following attestation bundles were made for askrita-0.13.13.tar.gz:

Publisher: publish.yaml on cvs-health/AskRITA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file askrita-0.13.13-py3-none-any.whl.

File metadata

  • Download URL: askrita-0.13.13-py3-none-any.whl
  • Upload date:
  • Size: 249.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for askrita-0.13.13-py3-none-any.whl
Algorithm Hash digest
SHA256 2c79a4baef362bd6c1aadd4612055a635b258958a40fe59b890c5a66ba6d14ed
MD5 4aa8c483f4e13464041d13d7c095473c
BLAKE2b-256 1efdd2be55bf95c584a722788ce6bb9cd073dfe704e95c20bb5976aab8f6f7fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for askrita-0.13.13-py3-none-any.whl:

Publisher: publish.yaml on cvs-health/AskRITA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page