SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching
Project description
QuerySUTRA
SUTRA: Structured-Unstructured-Text-Retrieval-Architecture
Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.
Installation
pip install QuerySUTRA
# Optional features
pip install QuerySUTRA[embeddings] # Smart caching
pip install QuerySUTRA[mysql] # MySQL support
pip install QuerySUTRA[postgres] # PostgreSQL support
pip install QuerySUTRA[all] # All features
Key Features
1. Automatic Multi-Table Creation
Upload PDFs, Word documents, or text files and automatically extract structured entities.
from sutra import SUTRA
sutra = SUTRA(api_key="your-openai-key")
sutra.upload("employee_data.pdf")
# Automatically creates:
# - employee_data_people (20 rows, 6 columns)
# - employee_data_contacts (20 rows, 4 columns)
# - employee_data_events (15 rows, 4 columns)
2. Natural Language Querying
result = sutra.ask("Show me all people from New York")
print(result.data)
# With visualization
result = sutra.ask("Show sales by region", viz="pie")
3. Load Existing Databases
# Load SQLite database
sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")
# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")
# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")
4. Custom Visualizations
result = sutra.ask("Sales by region", viz="pie") # Pie chart
result = sutra.ask("Trends", viz="line") # Line chart
result = sutra.ask("Compare", viz="bar") # Bar chart
result = sutra.ask("Correlation", viz="scatter") # Scatter plot
result = sutra.ask("Data", viz="table") # Table view
result = sutra.ask("Analysis", viz="heatmap") # Heatmap
result = sutra.ask("Auto", viz=True) # Auto-detect
5. Smart Fuzzy Matching
sutra = SUTRA(api_key="your-key", fuzzy_match=True)
# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")
6. Intelligent Caching with Embeddings
sutra = SUTRA(api_key="your-key", use_embeddings=True)
result = sutra.ask("Show sales") # Calls API
result = sutra.ask("Display sales data") # Uses cache (no API call)
7. Irrelevant Query Detection
sutra = SUTRA(api_key="your-key", check_relevance=True)
result = sutra.ask("What is the weather?")
# Warns: "This question seems irrelevant to your database"
8. Direct SQL Access (Free)
result = sutra.sql("SELECT * FROM people WHERE city='New York'")
print(result.data)
Complete Configuration
sutra = SUTRA(
api_key="your-openai-key",
db="database.db", # SQLite path
use_embeddings=True, # Smart caching (saves API calls)
check_relevance=True, # Detect irrelevant queries
fuzzy_match=True, # Better NLP
cache_queries=True # Simple caching
)
Supported Formats
CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame
Usage Examples
Basic Workflow
sutra = SUTRA(api_key="your-key")
sutra.upload("data.pdf")
sutra.tables() # View tables
sutra.schema() # View schema
sutra.peek("table_name", n=10) # Preview data
result = sutra.ask("Your question?")
Database Export
sutra.export_db("backup.db", format="sqlite")
sutra.export_db("schema.sql", format="sql")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
sutra.backup("./backups")
How It Works
Entity Extraction Example
Input PDF:
John Doe lives at 123 Main St, Dallas. Email: john@company.com.
Sarah Smith lives at 456 Oak Ave, Boston. Email: sarah@company.com.
Output Tables:
people
| id | name | address | city | |
|---|---|---|---|---|
| 1 | John Doe | 123 Main St | Dallas | john@company.com |
| 2 | Sarah Smith | 456 Oak Ave | Boston | sarah@company.com |
Embeddings for Smart Caching
Uses all-MiniLM-L6-v2 model (80MB, runs locally):
- Query 1: "Show sales" → API call
- Query 2: "Display sales" → 92% similar → Cached (no API call)
Fuzzy Matching
- Query: "New York City"
- Database: ["New York", "Dallas", "Boston"]
- Match: "New York City" → "New York" (85% similar)
API Reference
Class Methods
SUTRA.load_from_db(db_path, api_key, **kwargs) - Load existing SQLite database
SUTRA.connect_mysql(host, user, password, database, ...) - Connect to MySQL
SUTRA.connect_postgres(host, user, password, database, ...) - Connect to PostgreSQL
Instance Methods
upload(data, name=None) - Upload data
ask(question, viz=False, table=None) - Natural language query
sql(query, viz=False) - Raw SQL query
tables() - List all tables
schema(table=None) - Show schema
peek(table=None, n=5) - Preview data
export_db(path, format) - Export database
save_to_mysql(...) - Export to MySQL
save_to_postgres(...) - Export to PostgreSQL
backup(path=None) - Create backup
close() - Close connection
Performance Tips
- Use
load_from_db()to avoid re-uploading - Use
sql()for complex queries (no API cost) - Enable
use_embeddings=Truefor caching - Enable
cache_queries=Truefor exact matches
Troubleshooting
No API key error: sutra = SUTRA(api_key="sk-...")
PDF fails: pip install PyPDF2
MySQL error: pip install QuerySUTRA[mysql]
Embeddings error: pip install QuerySUTRA[embeddings]
Requirements
- Python 3.8+
- OpenAI API key
- 100MB disk space (if using embeddings)
License
MIT License
Changelog
v0.3.1
- Semantic embeddings for smart caching
- Fuzzy matching for better NLP
- Irrelevant query detection
- Load existing databases
- MySQL/PostgreSQL connectivity
- Custom visualizations
- All features optional
Made by Aditya Batta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file querysutra-0.3.2.tar.gz.
File metadata
- Download URL: querysutra-0.3.2.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
722d991934e40d42bfde34adcbe6622909e3c253a7f9bcb4a1bb9a201cbb7fcd
|
|
| MD5 |
c0e268afeec7f3e8125282c7bf469e0e
|
|
| BLAKE2b-256 |
73ec2f3c5503c5765d2d658d7d5054192556549c2e4b3a6581e4fbda02c02dc1
|
File details
Details for the file querysutra-0.3.2-py3-none-any.whl.
File metadata
- Download URL: querysutra-0.3.2-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3174d4508c9f7886f1bdfeec2e8cee445b655d451d4531f1af575a5a7f1ad72b
|
|
| MD5 |
6d687ba2c50d4e3d62fed77907c88ad5
|
|
| BLAKE2b-256 |
57e31e5d8c6b8336a7294a2bfe0426c17f6d3fc4bc7440bf831f86e25b36721d
|