Skip to main content

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - Creates multiple structured tables from ANY data

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Transform any data into structured, queryable databases with AI-powered entity extraction.

๐ŸŽฏ Key Features

โœ… Multi-Table Creation - Automatically extracts entities and creates multiple related tables
โœ… Smart Entity Extraction - Identifies people, contacts, events, organizations from unstructured data
โœ… Natural Language Queries - Ask questions in plain English
โœ… Multiple Data Formats - CSV, Excel, JSON, PDF, DOCX, TXT, SQL, DataFrames
โœ… Direct SQL Access - Query without API costs
โœ… Auto Visualization - Built-in charts and graphs
โœ… Cloud Export - Save to MySQL, PostgreSQL, or local SQLite

๐Ÿ“ฆ Installation

pip install QuerySUTRA

# With MySQL support
pip install QuerySUTRA[mysql]

# With PostgreSQL support
pip install QuerySUTRA[postgres]

# With all database support
pip install QuerySUTRA[all]

๐Ÿš€ Quick Start

from sutra import SUTRA

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload any data - AI creates multiple structured tables!
sutra.upload("employee_story.pdf")

# View all created tables
sutra.tables()
# Output:
# ๐Ÿ“‹ TABLES IN DATABASE
# 1. employee_story_people (20 rows, 6 columns)
#    Columns: id, name, address, city, email, phone
# 2. employee_story_contacts (20 rows, 4 columns)
#    Columns: id, person_id, email, phone
# 3. employee_story_events (15 rows, 4 columns)
#    Columns: id, host_id, description, city

# View detailed schema
sutra.schema()

# Query with natural language
result = sutra.ask("Show all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show events by city", viz=True)

# Direct SQL (no API cost!)
result = sutra.sql("SELECT * FROM employee_story_people WHERE city='Dallas'")
print(result.data)

๐Ÿ“Š How It Works

From Unstructured PDF to Structured Tables

Input: PDF with employee information

AI Automatically Creates:

๐Ÿ“‹ Created 3 structured tables:
  ๐Ÿ“Š employee_story_people: 20 rows, 6 columns
     - id, name, address, city, email, phone
  ๐Ÿ“Š employee_story_contacts: 20 rows, 4 columns
     - id, person_id, email, phone  
  ๐Ÿ“Š employee_story_events: 15 rows, 4 columns
     - id, host_id, description, city

๐Ÿ’ก Usage Examples

1. Upload Different Formats

# CSV file
sutra.upload("sales_data.csv")

# Excel file
sutra.upload("quarterly_report.xlsx")

# PDF document (AI extracts entities!)
sutra.upload("company_directory.pdf")

# Word document
sutra.upload("meeting_notes.docx")

# Text file
sutra.upload("log_data.txt")

# DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'score': [95, 87]})
sutra.upload(df, name="test_scores")

2. View Your Data

# List all tables with details
sutra.tables()

# Show schema with data types
sutra.schema()

# Show schema for specific table
sutra.schema("employee_story_people")

# Preview data
sutra.peek("employee_story_people", n=10)

3. Query Your Data

# Natural language (uses OpenAI)
result = sutra.ask("What are the top 5 sales by region?")
print(result.data)

# With visualization
result = sutra.ask("Show sales trends by month", viz=True)

# Interactive mode (asks if you want viz)
result = sutra.interactive("Compare revenue across quarters")

# Direct SQL (free, no API!)
result = sutra.sql("SELECT city, COUNT(*) as count FROM employee_story_people GROUP BY city")
print(result.data)

4. Export Your Database

# Export to MySQL (local or cloud)
sutra.save_to_mysql(
    host="localhost",
    user="root",
    password="password",
    database="my_database"
)

# Export to PostgreSQL
sutra.save_to_postgres(
    host="mydb.amazonaws.com",
    user="admin",
    password="password",
    database="production_db"
)

# Export to SQLite file
sutra.export_db("backup.db", format="sqlite")

# Export to SQL dump
sutra.export_db("schema.sql", format="sql")

# Export to JSON
sutra.export_db("data.json", format="json")

# Export to Excel (all tables as sheets)
sutra.export_db("data.xlsx", format="excel")

# Complete backup
sutra.backup("./backups")

๐Ÿ”ฅ Advanced Features

Entity Extraction

QuerySUTRA automatically identifies and extracts:

  • ๐Ÿ‘ฅ People - Names, addresses, contact info
  • ๐Ÿ“ง Contacts - Emails, phone numbers
  • ๐Ÿ“… Events - Meetings, activities, locations
  • ๐Ÿข Organizations - Companies, departments
  • ๐Ÿ“ Locations - Cities, addresses, coordinates

Multiple Table Relationships

# AI creates relational structure
sutra.upload("company_data.pdf")

# Result:
# people table with person_id
# contacts table with foreign key to person_id
# events table with host_id linking to people

Query Across Tables

# Natural language handles joins automatically
result = sutra.ask("Show all events hosted by people from Dallas")

# Or write SQL joins manually
result = sutra.sql("""
    SELECT e.description, p.name, p.city
    FROM employee_story_events e
    JOIN employee_story_people p ON e.host_id = p.id
    WHERE p.city = 'Dallas'
""")

๐Ÿ“ˆ Visualization

# Auto-detect best chart type
result = sutra.ask("Show revenue by product", viz=True)

# Interactive charts with Plotly
# - Bar charts for categorical data
# - Line charts for time series  
# - Tables for detailed data
# - Pie charts for distributions

๐ŸŒ Cloud Database Integration

AWS RDS MySQL

sutra.save_to_mysql(
    host="mydb.xxxx.us-east-1.rds.amazonaws.com",
    user="admin",
    password="password",
    database="production",
    port=3306
)

Google Cloud SQL

sutra.save_to_postgres(
    host="35.123.456.789",
    user="postgres",
    password="password",
    database="analytics"
)

Heroku Postgres

sutra.save_to_postgres(
    host="ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com",
    user="username",
    password="password",
    database="dbname",
    port=5432
)

โšก Performance Tips

# Use direct SQL for complex queries (faster, no API cost)
result = sutra.sql("SELECT * FROM data WHERE status='active'")

# Cache is automatic for repeated questions
result1 = sutra.ask("Show total sales")  # Calls API
result2 = sutra.ask("Show total sales")  # From cache โšก

# Export results for reuse
result.data.to_csv("results.csv")

๐Ÿ”’ API Key Security

# Option 1: Pass directly (not recommended for production)
sutra = SUTRA(api_key="sk-...")

# Option 2: Environment variable (recommended)
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
sutra = SUTRA()

# Option 3: .env file
# Create .env file with: OPENAI_API_KEY=sk-...
from dotenv import load_dotenv
load_dotenv()
sutra = SUTRA()

๐ŸŽ“ Complete Example

from sutra import SUTRA
import pandas as pd

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload PDF - creates multiple tables
sutra.upload("employee_directory.pdf")

# View what was created
tables_info = sutra.tables()
print(f"Created {len(tables_info)} tables")

# View detailed schema
sutra.schema()

# Query specific table
result = sutra.ask("How many people are in each city?", 
                   table="employee_directory_people")
print(result.data)

# Visualize
result = sutra.ask("Show distribution of people by city", viz=True)

# Export to MySQL
sutra.save_to_mysql("localhost", "root", "password", "company_db")

# Backup everything
sutra.backup("./backups")

# Close connection
sutra.close()

๐Ÿ“š Method Reference

Core Methods

Method Description
upload(data, name) Upload any data format, creates multiple tables
tables() List all tables with row/column counts
schema(table) Show detailed schema with data types
peek(table, n) Preview first n rows
ask(question, viz) Natural language query
sql(query, viz) Direct SQL query
interactive(question) Query with viz prompt

Export Methods

Method Description
export_db(path, format) Export database (sqlite/sql/json/excel)
save_to_mysql(...) Save to MySQL database
save_to_postgres(...) Save to PostgreSQL database
backup(path) Complete backup with timestamp

๐Ÿ› Troubleshooting

Q: Only one table created instead of multiple?
A: Make sure you have OpenAI API key set. Without it, falls back to simple parsing.

Q: "No API key" error?
A: Set your OpenAI key: sutra = SUTRA(api_key="sk-...")

Q: PDF extraction failed?
A: Install PyPDF2: pip install PyPDF2

Q: MySQL export error?
A: Install extras: pip install QuerySUTRA[mysql]

๐Ÿ“„ License

MIT License - see LICENSE file

๐Ÿค Contributing

Contributions welcome! Open an issue or submit a PR.

๐Ÿ“ž Support


Made with โค๏ธ by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.2.3.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.2.3-py3-none-any.whl (47.2 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.2.3.tar.gz.

File metadata

  • Download URL: querysutra-0.2.3.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e5d0a8d4a907d950c6268e8758ccdbfee8e6a3580dcc7fb126669e1e7b65c608
MD5 b381708ba24d28dca0f044cd1adc69c2
BLAKE2b-256 01196ff918f86324f71d5c22cc2204e922023439ae1510a011fb8f9b8130edcc

See more details on using hashes here.

File details

Details for the file querysutra-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 47.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9746ca5baedd48355c35a5680aa16c2f4779b7e18a15919aad4c99c068f3b621
MD5 f39c7ae288c06d86cf6525b0c823db7a
BLAKE2b-256 071f5ebaff7b11bef51a8c909e554878e24b01562efccd5d253e42cd01cba11b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page