SUTRA
Project description
QuerySUTRA
SUTRA: Structured-Unstructured-Text-Retrieval-Architecture
AI-powered data analysis. Upload any data (PDF, Word, Text, CSV, Excel), query with natural language, export to MySQL.
Installation
pip install QuerySUTRA
pip install QuerySUTRA[mysql] # MySQL support
pip install QuerySUTRA[embeddings] # Smart caching
pip install QuerySUTRA[all] # All features
Quick Start
from sutra import SUTRA
sutra = SUTRA(api_key="your-openai-key")
sutra.upload("data.pdf") # or .docx, .txt, .csv, .xlsx, .json
result = sutra.ask("Show me all people")
print(result.data)
Supported Formats
Structured Data:
- CSV (.csv)
- Excel (.xlsx, .xls)
- JSON (.json)
- SQL (.sql)
- Pandas DataFrame
Unstructured Documents (AI Extraction):
- PDF (.pdf)
- Word (.docx)
- Text (.txt)
Core Features
1. Upload Any Data Format
# Structured data
sutra.upload("sales.csv")
sutra.upload("report.xlsx")
sutra.upload("api_data.json")
sutra.upload("dump.sql")
# Unstructured documents (AI extracts entities)
sutra.upload("resume.pdf")
sutra.upload("meeting_notes.docx")
sutra.upload("transcript.txt")
# DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Alice'], 'score': [95]})
sutra.upload(df, name="scores")
2. Complete Data Extraction
Processes entire documents in chunks. No data loss.
# PDF - Extracts ALL pages
sutra.upload("50_page_report.pdf") # Gets all 50 pages, all employees
# Word - Extracts ALL content
sutra.upload("large_document.docx") # Full document processed
# Text - Processes ALL lines
sutra.upload("log_file.txt") # Entire file analyzed
# All create multiple related tables
sutra.tables()
3. Automatic MySQL Export
One-line upload and export. Database auto-created.
sutra.upload("data.pdf", auto_export_mysql={
'host': 'localhost',
'user': 'root',
'password': 'your_password',
'database': 'my_database' # Auto-creates if not exists
})
4. Natural Language Queries
result = sutra.ask("Show all people from California")
result = sutra.ask("Who has Python skills?", table="skills")
result = sutra.ask("Count employees by state", viz="pie")
5. Custom Visualizations
result = sutra.ask("Sales by region", viz="pie") # Pie chart
result = sutra.ask("Trends over time", viz="line") # Line chart
result = sutra.ask("Compare values", viz="bar") # Bar chart
result = sutra.ask("Correlations", viz="scatter") # Scatter
result = sutra.ask("Show table", viz="table") # Table
result = sutra.ask("Heatmap", viz="heatmap") # Heatmap
result = sutra.ask("Auto", viz=True) # Auto-detect
6. Load Existing Databases
# Load SQLite
sutra = SUTRA.load_from_db("data.db", api_key="key")
# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "pass", "database")
# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "pass", "database")
7. Fuzzy Matching
sutra = SUTRA(api_key="key", fuzzy_match=True)
# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")
# Fuzzy: 'City' -> 'New York'
Uses difflib.get_close_matches with 60% threshold.
8. Embeddings for Smart Caching
Save 90% on API costs.
sutra = SUTRA(api_key="key", use_embeddings=True)
result = sutra.ask("Show sales") # API call
result = sutra.ask("Display sales data") # Cached (92% similar)
result = sutra.ask("Give me sales info") # Cached (88% similar)
How it works:
- Model:
all-MiniLM-L6-v2(80MB, runs locally) - Converts queries to 384D vectors
- 85% similarity threshold
- No external API calls
Cost savings:
- 10 similar queries: 1 API call vs 10 = 90% savings
9. Irrelevant Query Detection
sutra = SUTRA(api_key="key", check_relevance=True)
result = sutra.ask("What's the weather?")
# Warning: Query may be irrelevant
10. Direct SQL
result = sutra.sql("SELECT * FROM people WHERE state='CA'")
Complete Example
from sutra import SUTRA
# Initialize with all features
sutra = SUTRA(
api_key="your-key",
use_embeddings=True,
fuzzy_match=True,
check_relevance=True
)
# Upload any format
sutra.upload("employees.pdf") # PDF
sutra.upload("skills.docx") # Word
sutra.upload("projects.txt") # Text
sutra.upload("sales.csv") # CSV
sutra.upload("budget.xlsx") # Excel
# View tables
sutra.tables()
# Query
result = sutra.ask("Show all people", viz="bar")
# Export to MySQL
sutra.save_to_mysql("localhost", "root", "pass", "my_db")
Import to MySQL Workflow
Colab:
sutra.upload("data.pdf")
sutra.export_db("data.db", "sqlite")
from google.colab import files
files.download("data.db")
Windows:
sutra = SUTRA.load_from_db("data.db", api_key="key")
sutra.save_to_mysql("localhost", "root", "pass", "my_db")
Export Options
sutra.export_db("backup.db", "sqlite")
sutra.export_db("schema.sql", "sql")
sutra.export_db("data.json", "json")
sutra.export_db("data.xlsx", "excel")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
API Reference
Methods
upload(data, name, auto_export_mysql)- Upload any formatask(question, viz, table)- Natural language querysql(query, viz)- Direct SQLtables()- List tablesschema()- Show schemapeek(table, n)- Previewsave_to_mysql(...)- Export MySQL (auto-creates DB)export_db(path, format)- Export databaseload_from_db(path)- Load SQLiteconnect_mysql(...)- Connect MySQL
Requirements
Python 3.8+, OpenAI API key
License
MIT
Made by Aditya Batta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file querysutra-0.4.6.tar.gz.
File metadata
- Download URL: querysutra-0.4.6.tar.gz
- Upload date:
- Size: 42.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e36f318bea251c9b1131a70fb5ce359b96e29cafd64392b35d600cef9fd8060
|
|
| MD5 |
4f69653f13e3f3515627af382f9d4e85
|
|
| BLAKE2b-256 |
6ad3d85da828a15676c98f1102233b953884159d84a43bc5eec8d9b80aac1581
|
File details
Details for the file querysutra-0.4.6-py3-none-any.whl.
File metadata
- Download URL: querysutra-0.4.6-py3-none-any.whl
- Upload date:
- Size: 46.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f66f34c0d3a70b0d79dfdaa68f3c2d617349157abb16bfca669d69019c5ef0
|
|
| MD5 |
68db5c0b161be5703494cc240a6e9705
|
|
| BLAKE2b-256 |
388c6fde7e5a1b591b0800090ea8c5baaa3ee6d4483d53e44f8b74a2d87515f0
|