SUTRA: AI-powered data analysis
Project description
QuerySUTRA
SUTRA: Structured-Unstructured-Text-Retrieval-Architecture
AI-powered data analysis library. Upload PDFs, query with natural language, export to MySQL automatically.
Installation
pip install QuerySUTRA
pip install QuerySUTRA[mysql] # For MySQL export
Quick Start
from sutra import SUTRA
# Upload PDF and auto-export to MySQL in ONE step
sutra = SUTRA(api_key="your-openai-key")
sutra.upload("data.pdf", auto_export_mysql={
'host': 'localhost',
'user': 'root',
'password': '123456',
'database': 'my_database' # Auto-creates if not exists
})
# Query immediately
result = sutra.ask("Show me all people")
print(result.data)
Features
1. Automatic MySQL Export
Database auto-created if not exists. No errors.
# Upload and export to MySQL automatically
sutra.upload("data.pdf", auto_export_mysql={
'host': 'localhost',
'user': 'root',
'password': 'your_password',
'database': 'my_new_database' # Creates automatically
})
2. Complete Data Extraction
Processes entire PDF in chunks. Extracts ALL employees (not just first 10).
sutra.upload("large_document.pdf") # Extracts all 50+ employees
sutra.tables() # Shows all extracted tables
3. Natural Language Queries
result = sutra.ask("Show all people from California")
result = sutra.ask("Who has Python skills?", table="skills")
result = sutra.ask("Count employees by state", viz="pie")
4. Custom Visualizations
result = sutra.ask("Sales by region", viz="pie")
result = sutra.ask("Trends", viz="line")
result = sutra.ask("Compare", viz="bar")
result = sutra.ask("Data", viz="scatter")
5. Load Existing Databases
# Load SQLite
sutra = SUTRA.load_from_db("data.db", api_key="key")
# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "pass", "database")
# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "pass", "database")
6. Smart Features (Optional)
sutra = SUTRA(
api_key="your-key",
use_embeddings=True, # Cache similar queries (saves API calls)
fuzzy_match=True, # "New York City" matches "New York"
check_relevance=True, # Detect irrelevant queries
cache_queries=True # Cache exact queries
)
7. Direct SQL (Free)
result = sutra.sql("SELECT * FROM people WHERE state='CA'")
print(result.data)
Complete Workflow
In Colab:
from sutra import SUTRA
sutra = SUTRA(api_key="your-key")
sutra.upload("employee_data.pdf")
sutra.tables() # See extracted tables
# Export and download
sutra.export_db("data.db", format="sqlite")
from google.colab import files
files.download("data.db")
On Windows:
from sutra import SUTRA
# Load downloaded database
sutra = SUTRA.load_from_db("data.db", api_key="your-key")
# Export to MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "password", "my_database")
# Verify in MySQL
sutra_mysql = SUTRA.connect_mysql("localhost", "root", "password", "my_database")
sutra_mysql.tables()
Export Options
# SQLite
sutra.export_db("backup.db", format="sqlite")
# SQL dump
sutra.export_db("schema.sql", format="sql")
# JSON
sutra.export_db("data.json", format="json")
# Excel
sutra.export_db("data.xlsx", format="excel")
# MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "pass", "new_db")
# PostgreSQL
sutra.save_to_postgres("localhost", "postgres", "pass", "new_db")
API Reference
Initialize
SUTRA(api_key, db, use_embeddings, check_relevance, fuzzy_match, cache_queries)
Class Methods
load_from_db(path, api_key)- Load SQLiteconnect_mysql(host, user, password, database)- Connect MySQLconnect_postgres(host, user, password, database)- Connect PostgreSQL
Instance Methods
upload(data, name, auto_export_mysql)- Upload with optional auto-exportask(question, viz, table)- Natural language querysql(query, viz)- Direct SQLtables()- List tablesschema(table)- Show schemapeek(table, n)- Preview dataexport_db(path, format)- Export databasesave_to_mysql(host, user, password, database)- Export to MySQL (auto-creates DB)save_to_postgres(...)- Export to PostgreSQLbackup(path)- Backupclose()- Close
Troubleshooting
MySQL database doesn't exist
- Fixed in v0.4.0 - auto-creates database automatically
- No need to manually create database
Only 10 employees extracted from 50-employee PDF
- Fixed in v0.4.0 - processes entire PDF in chunks
- Upgrade:
pip install --upgrade QuerySUTRA
connect_mysql() not found
- Update:
pip install --upgrade QuerySUTRA - Install MySQL support:
pip install QuerySUTRA[mysql]
Supported Formats
CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame
Requirements
- Python 3.8+
- OpenAI API key
- MySQL/PostgreSQL (optional)
License
MIT License
Changelog
v0.4.0
- AUTO-CREATES MySQL database (no more errors)
- Complete PDF extraction (all pages, all employees)
- Chunk processing for large documents
- One-line auto-export to MySQL
- Simplified everything
v0.3.x
- MySQL/PostgreSQL connectivity
- Embeddings caching
- Fuzzy matching
- Custom visualizations
Made by Aditya Batta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file querysutra-0.4.2.tar.gz.
File metadata
- Download URL: querysutra-0.4.2.tar.gz
- Upload date:
- Size: 42.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf1200c5230191bae97b48e0ea978e6c5162c25bc3882321eeae8f99a5f1aa09
|
|
| MD5 |
9d099b9045e3f5ab8fa42d22ac335e07
|
|
| BLAKE2b-256 |
481cb60e37659f2857d2834abbde176ffe7aa7b54e5b5ecc164b03758bfbecaf
|
File details
Details for the file querysutra-0.4.2-py3-none-any.whl.
File metadata
- Download URL: querysutra-0.4.2-py3-none-any.whl
- Upload date:
- Size: 46.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bac14366def1c108288e7046d9269eec2ec5142189552b68524aa8537d75d363
|
|
| MD5 |
79a1d93d0cd3abee4ef10e198e3cf789
|
|
| BLAKE2b-256 |
3c5f914c1bbb7ea1c80689369d83a61a932dec15a092a6a44f745b9434273a5b
|