SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - Creates multiple structured tables from ANY data
Project description
QuerySUTRA
SUTRA: Structured-Unstructured-Text-Retrieval-Architecture
Transform any data into structured, queryable databases with AI-powered entity extraction.
๐ฏ Key Features
โ
Multi-Table Creation - Automatically extracts entities and creates multiple related tables
โ
Smart Entity Extraction - Identifies people, contacts, events, organizations from unstructured data
โ
Natural Language Queries - Ask questions in plain English
โ
Multiple Data Formats - CSV, Excel, JSON, PDF, DOCX, TXT, SQL, DataFrames
โ
Direct SQL Access - Query without API costs
โ
Auto Visualization - Built-in charts and graphs
โ
Cloud Export - Save to MySQL, PostgreSQL, or local SQLite
๐ฆ Installation
pip install QuerySUTRA
# With MySQL support
pip install QuerySUTRA[mysql]
# With PostgreSQL support
pip install QuerySUTRA[postgres]
# With all database support
pip install QuerySUTRA[all]
๐ Quick Start
from sutra import SUTRA
# Initialize
sutra = SUTRA(api_key="your-openai-key")
# Upload any data - AI creates multiple structured tables!
sutra.upload("employee_story.pdf")
# View all created tables
sutra.tables()
# Output:
# ๐ TABLES IN DATABASE
# 1. employee_story_people (20 rows, 6 columns)
# Columns: id, name, address, city, email, phone
# 2. employee_story_contacts (20 rows, 4 columns)
# Columns: id, person_id, email, phone
# 3. employee_story_events (15 rows, 4 columns)
# Columns: id, host_id, description, city
# View detailed schema
sutra.schema()
# Query with natural language
result = sutra.ask("Show all people from New York")
print(result.data)
# With visualization
result = sutra.ask("Show events by city", viz=True)
# Direct SQL (no API cost!)
result = sutra.sql("SELECT * FROM employee_story_people WHERE city='Dallas'")
print(result.data)
๐ How It Works
From Unstructured PDF to Structured Tables
Input: PDF with employee information
AI Automatically Creates:
๐ Created 3 structured tables:
๐ employee_story_people: 20 rows, 6 columns
- id, name, address, city, email, phone
๐ employee_story_contacts: 20 rows, 4 columns
- id, person_id, email, phone
๐ employee_story_events: 15 rows, 4 columns
- id, host_id, description, city
๐ก Usage Examples
1. Upload Different Formats
# CSV file
sutra.upload("sales_data.csv")
# Excel file
sutra.upload("quarterly_report.xlsx")
# PDF document (AI extracts entities!)
sutra.upload("company_directory.pdf")
# Word document
sutra.upload("meeting_notes.docx")
# Text file
sutra.upload("log_data.txt")
# DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'score': [95, 87]})
sutra.upload(df, name="test_scores")
2. View Your Data
# List all tables with details
sutra.tables()
# Show schema with data types
sutra.schema()
# Show schema for specific table
sutra.schema("employee_story_people")
# Preview data
sutra.peek("employee_story_people", n=10)
3. Query Your Data
# Natural language (uses OpenAI)
result = sutra.ask("What are the top 5 sales by region?")
print(result.data)
# With visualization
result = sutra.ask("Show sales trends by month", viz=True)
# Interactive mode (asks if you want viz)
result = sutra.interactive("Compare revenue across quarters")
# Direct SQL (free, no API!)
result = sutra.sql("SELECT city, COUNT(*) as count FROM employee_story_people GROUP BY city")
print(result.data)
4. Export Your Database
# Export to MySQL (local or cloud)
sutra.save_to_mysql(
host="localhost",
user="root",
password="password",
database="my_database"
)
# Export to PostgreSQL
sutra.save_to_postgres(
host="mydb.amazonaws.com",
user="admin",
password="password",
database="production_db"
)
# Export to SQLite file
sutra.export_db("backup.db", format="sqlite")
# Export to SQL dump
sutra.export_db("schema.sql", format="sql")
# Export to JSON
sutra.export_db("data.json", format="json")
# Export to Excel (all tables as sheets)
sutra.export_db("data.xlsx", format="excel")
# Complete backup
sutra.backup("./backups")
๐ฅ Advanced Features
Entity Extraction
QuerySUTRA automatically identifies and extracts:
- ๐ฅ People - Names, addresses, contact info
- ๐ง Contacts - Emails, phone numbers
- ๐ Events - Meetings, activities, locations
- ๐ข Organizations - Companies, departments
- ๐ Locations - Cities, addresses, coordinates
Multiple Table Relationships
# AI creates relational structure
sutra.upload("company_data.pdf")
# Result:
# people table with person_id
# contacts table with foreign key to person_id
# events table with host_id linking to people
Query Across Tables
# Natural language handles joins automatically
result = sutra.ask("Show all events hosted by people from Dallas")
# Or write SQL joins manually
result = sutra.sql("""
SELECT e.description, p.name, p.city
FROM employee_story_events e
JOIN employee_story_people p ON e.host_id = p.id
WHERE p.city = 'Dallas'
""")
๐ Visualization
# Auto-detect best chart type
result = sutra.ask("Show revenue by product", viz=True)
# Interactive charts with Plotly
# - Bar charts for categorical data
# - Line charts for time series
# - Tables for detailed data
# - Pie charts for distributions
๐ Cloud Database Integration
AWS RDS MySQL
sutra.save_to_mysql(
host="mydb.xxxx.us-east-1.rds.amazonaws.com",
user="admin",
password="password",
database="production",
port=3306
)
Google Cloud SQL
sutra.save_to_postgres(
host="35.123.456.789",
user="postgres",
password="password",
database="analytics"
)
Heroku Postgres
sutra.save_to_postgres(
host="ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com",
user="username",
password="password",
database="dbname",
port=5432
)
โก Performance Tips
# Use direct SQL for complex queries (faster, no API cost)
result = sutra.sql("SELECT * FROM data WHERE status='active'")
# Cache is automatic for repeated questions
result1 = sutra.ask("Show total sales") # Calls API
result2 = sutra.ask("Show total sales") # From cache โก
# Export results for reuse
result.data.to_csv("results.csv")
๐ API Key Security
# Option 1: Pass directly (not recommended for production)
sutra = SUTRA(api_key="sk-...")
# Option 2: Environment variable (recommended)
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
sutra = SUTRA()
# Option 3: .env file
# Create .env file with: OPENAI_API_KEY=sk-...
from dotenv import load_dotenv
load_dotenv()
sutra = SUTRA()
๐ Complete Example
from sutra import SUTRA
import pandas as pd
# Initialize
sutra = SUTRA(api_key="your-openai-key")
# Upload PDF - creates multiple tables
sutra.upload("employee_directory.pdf")
# View what was created
tables_info = sutra.tables()
print(f"Created {len(tables_info)} tables")
# View detailed schema
sutra.schema()
# Query specific table
result = sutra.ask("How many people are in each city?",
table="employee_directory_people")
print(result.data)
# Visualize
result = sutra.ask("Show distribution of people by city", viz=True)
# Export to MySQL
sutra.save_to_mysql("localhost", "root", "password", "company_db")
# Backup everything
sutra.backup("./backups")
# Close connection
sutra.close()
๐ Method Reference
Core Methods
| Method | Description |
|---|---|
upload(data, name) |
Upload any data format, creates multiple tables |
tables() |
List all tables with row/column counts |
schema(table) |
Show detailed schema with data types |
peek(table, n) |
Preview first n rows |
ask(question, viz) |
Natural language query |
sql(query, viz) |
Direct SQL query |
interactive(question) |
Query with viz prompt |
Export Methods
| Method | Description |
|---|---|
export_db(path, format) |
Export database (sqlite/sql/json/excel) |
save_to_mysql(...) |
Save to MySQL database |
save_to_postgres(...) |
Save to PostgreSQL database |
backup(path) |
Complete backup with timestamp |
๐ Troubleshooting
Q: Only one table created instead of multiple?
A: Make sure you have OpenAI API key set. Without it, falls back to simple parsing.
Q: "No API key" error?
A: Set your OpenAI key: sutra = SUTRA(api_key="sk-...")
Q: PDF extraction failed?
A: Install PyPDF2: pip install PyPDF2
Q: MySQL export error?
A: Install extras: pip install QuerySUTRA[mysql]
๐ License
MIT License - see LICENSE file
๐ค Contributing
Contributions welcome! Open an issue or submit a PR.
๐ Support
- Issues: GitHub Issues
- Email: your@email.com
Made with โค๏ธ by Aditya Batta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file querysutra-0.2.3.tar.gz.
File metadata
- Download URL: querysutra-0.2.3.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5d0a8d4a907d950c6268e8758ccdbfee8e6a3580dcc7fb126669e1e7b65c608
|
|
| MD5 |
b381708ba24d28dca0f044cd1adc69c2
|
|
| BLAKE2b-256 |
01196ff918f86324f71d5c22cc2204e922023439ae1510a011fb8f9b8130edcc
|
File details
Details for the file querysutra-0.2.3-py3-none-any.whl.
File metadata
- Download URL: querysutra-0.2.3-py3-none-any.whl
- Upload date:
- Size: 47.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9746ca5baedd48355c35a5680aa16c2f4779b7e18a15919aad4c99c068f3b621
|
|
| MD5 |
f39c7ae288c06d86cf6525b0c823db7a
|
|
| BLAKE2b-256 |
071f5ebaff7b11bef51a8c909e554878e24b01562efccd5d253e42cd01cba11b
|