SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Database

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.

Installation

pip install QuerySUTRA

# Optional features
pip install QuerySUTRA[embeddings]  # Smart caching
pip install QuerySUTRA[mysql]       # MySQL support
pip install QuerySUTRA[postgres]    # PostgreSQL support
pip install QuerySUTRA[all]         # All features

Key Features

1. Automatic Multi-Table Creation

Upload PDFs, Word documents, or text files and automatically extract structured entities.

from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")
sutra.upload("employee_data.pdf")

# Automatically creates:
# - employee_data_people (20 rows, 6 columns)
# - employee_data_contacts (20 rows, 4 columns)
# - employee_data_events (15 rows, 4 columns)

2. Natural Language Querying

result = sutra.ask("Show me all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show sales by region", viz="pie")

3. Load Existing Databases

# Load SQLite database
sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")

# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")

4. Custom Visualizations

result = sutra.ask("Sales by region", viz="pie")       # Pie chart
result = sutra.ask("Trends", viz="line")               # Line chart
result = sutra.ask("Compare", viz="bar")               # Bar chart
result = sutra.ask("Correlation", viz="scatter")       # Scatter plot
result = sutra.ask("Data", viz="table")                # Table view
result = sutra.ask("Analysis", viz="heatmap")          # Heatmap
result = sutra.ask("Auto", viz=True)                   # Auto-detect

5. Smart Fuzzy Matching

sutra = SUTRA(api_key="your-key", fuzzy_match=True)

# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")

6. Intelligent Caching with Embeddings

sutra = SUTRA(api_key="your-key", use_embeddings=True)

result = sutra.ask("Show sales")           # Calls API
result = sutra.ask("Display sales data")   # Uses cache (no API call)

7. Irrelevant Query Detection

sutra = SUTRA(api_key="your-key", check_relevance=True)

result = sutra.ask("What is the weather?")
# Warns: "This question seems irrelevant to your database"

8. Direct SQL Access (Free)

result = sutra.sql("SELECT * FROM people WHERE city='New York'")
print(result.data)

Complete Configuration

sutra = SUTRA(
    api_key="your-openai-key",
    db="database.db",              # SQLite path
    use_embeddings=True,           # Smart caching (saves API calls)
    check_relevance=True,          # Detect irrelevant queries
    fuzzy_match=True,              # Better NLP
    cache_queries=True             # Simple caching
)

Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

Usage Examples

Basic Workflow

sutra = SUTRA(api_key="your-key")
sutra.upload("data.pdf")
sutra.tables()                    # View tables
sutra.schema()                    # View schema
sutra.peek("table_name", n=10)    # Preview data
result = sutra.ask("Your question?")

Database Export

sutra.export_db("backup.db", format="sqlite")
sutra.export_db("schema.sql", format="sql")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
sutra.backup("./backups")

How It Works

Entity Extraction Example

Input PDF:

John Doe lives at 123 Main St, Dallas. Email: john@company.com.
Sarah Smith lives at 456 Oak Ave, Boston. Email: sarah@company.com.

Output Tables:

people

id	name	address	city	email
1	John Doe	123 Main St	Dallas	john@company.com
2	Sarah Smith	456 Oak Ave	Boston	sarah@company.com

Embeddings for Smart Caching

Uses all-MiniLM-L6-v2 model (80MB, runs locally):

Query 1: "Show sales" → API call
Query 2: "Display sales" → 92% similar → Cached (no API call)

Fuzzy Matching

Query: "New York City"
Database: ["New York", "Dallas", "Boston"]
Match: "New York City" → "New York" (85% similar)

API Reference

Class Methods

SUTRA.load_from_db(db_path, api_key, **kwargs) - Load existing SQLite database

SUTRA.connect_mysql(host, user, password, database, ...) - Connect to MySQL

SUTRA.connect_postgres(host, user, password, database, ...) - Connect to PostgreSQL

Instance Methods

upload(data, name=None) - Upload data

ask(question, viz=False, table=None) - Natural language query

sql(query, viz=False) - Raw SQL query

tables() - List all tables

schema(table=None) - Show schema

peek(table=None, n=5) - Preview data

export_db(path, format) - Export database

save_to_mysql(...) - Export to MySQL

save_to_postgres(...) - Export to PostgreSQL

backup(path=None) - Create backup

close() - Close connection

Performance Tips

Use load_from_db() to avoid re-uploading
Use sql() for complex queries (no API cost)
Enable use_embeddings=True for caching
Enable cache_queries=True for exact matches

Troubleshooting

No API key error: sutra = SUTRA(api_key="sk-...")

PDF fails: pip install PyPDF2

MySQL error: pip install QuerySUTRA[mysql]

Embeddings error: pip install QuerySUTRA[embeddings]

Requirements

Python 3.8+
OpenAI API key
100MB disk space (if using embeddings)

License

MIT License

Changelog

v0.3.1

Semantic embeddings for smart caching
Fuzzy matching for better NLP
Irrelevant query detection
Load existing databases
MySQL/PostgreSQL connectivity
Custom visualizations
All features optional

Made by Aditya Batta

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Database

Release history Release notifications | RSS feed

0.6.2

Feb 6, 2026

0.6.1

Feb 5, 2026

0.6.0

Feb 5, 2026

0.5.3

Nov 18, 2025

0.5.2

Nov 17, 2025

0.5.1

Nov 17, 2025

0.5.0

Nov 17, 2025

0.4.6

Nov 17, 2025

0.4.5

Nov 17, 2025

0.4.4

Nov 17, 2025

0.4.3

Nov 17, 2025

0.4.2

Nov 17, 2025

0.4.1

Nov 17, 2025

0.4.0

Nov 17, 2025

0.3.3

Nov 16, 2025

This version

0.3.2

Nov 14, 2025

0.3.1

Nov 14, 2025

0.3.0

Nov 14, 2025

0.2.3

Nov 14, 2025

0.2.1

Nov 14, 2025

0.2.0

Nov 14, 2025

0.1.4

Nov 13, 2025

0.1.3

Nov 13, 2025

0.1.2

Nov 13, 2025

0.1.0

Nov 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.3.2.tar.gz (49.0 kB view details)

Uploaded Nov 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

querysutra-0.3.2-py3-none-any.whl (50.2 kB view details)

Uploaded Nov 14, 2025 Python 3

File details

Details for the file querysutra-0.3.2.tar.gz.

File metadata

Download URL: querysutra-0.3.2.tar.gz
Upload date: Nov 14, 2025
Size: 49.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`722d991934e40d42bfde34adcbe6622909e3c253a7f9bcb4a1bb9a201cbb7fcd`
MD5	`c0e268afeec7f3e8125282c7bf469e0e`
BLAKE2b-256	`73ec2f3c5503c5765d2d658d7d5054192556549c2e4b3a6581e4fbda02c02dc1`

See more details on using hashes here.

File details

Details for the file querysutra-0.3.2-py3-none-any.whl.

File metadata

Download URL: querysutra-0.3.2-py3-none-any.whl
Upload date: Nov 14, 2025
Size: 50.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3174d4508c9f7886f1bdfeec2e8cee445b655d451d4531f1af575a5a7f1ad72b`
MD5	`6d687ba2c50d4e3d62fed77907c88ad5`
BLAKE2b-256	`57e31e5d8c6b8336a7294a2bfe0426c17f6d3fc4bc7440bf831f86e25b36721d`

See more details on using hashes here.

QuerySUTRA 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

QuerySUTRA

Installation

Key Features

1. Automatic Multi-Table Creation

2. Natural Language Querying

3. Load Existing Databases

4. Custom Visualizations

5. Smart Fuzzy Matching

6. Intelligent Caching with Embeddings

7. Irrelevant Query Detection

8. Direct SQL Access (Free)

Complete Configuration

Supported Formats

Usage Examples

Basic Workflow

Database Export

How It Works

Entity Extraction Example

Embeddings for Smart Caching

Fuzzy Matching

API Reference

Class Methods

Instance Methods

Performance Tips

Troubleshooting

Requirements

License

Changelog

v0.3.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes