Skip to main content

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Transform any data into structured, queryable databases with AI-powered entity extraction.

๐ŸŽฏ Key Features

โœ… Multi-Table Creation - Automatically extracts entities and creates multiple related tables
โœ… Smart Entity Extraction - Identifies people, contacts, events, organizations from unstructured data
โœ… Natural Language Queries - Ask questions in plain English
โœ… Multiple Data Formats - CSV, Excel, JSON, PDF, DOCX, TXT, SQL, DataFrames
โœ… Direct SQL Access - Query without API costs
โœ… Auto Visualization - Built-in charts and graphs
โœ… Cloud Export - Save to MySQL, PostgreSQL, or local SQLite

๐Ÿ“ฆ Installation

pip install QuerySUTRA

# With MySQL support
pip install QuerySUTRA[mysql]

# With PostgreSQL support
pip install QuerySUTRA[postgres]

# With all database support
pip install QuerySUTRA[all]

๐Ÿš€ Quick Start

from sutra import SUTRA

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload any data - AI creates multiple structured tables!
sutra.upload("employee_story.pdf")

# View all created tables
sutra.tables()
# Output:
# ๐Ÿ“‹ TABLES IN DATABASE
# 1. employee_story_people (20 rows, 6 columns)
#    Columns: id, name, address, city, email, phone
# 2. employee_story_contacts (20 rows, 4 columns)
#    Columns: id, person_id, email, phone
# 3. employee_story_events (15 rows, 4 columns)
#    Columns: id, host_id, description, city

# View detailed schema
sutra.schema()

# Query with natural language
result = sutra.ask("Show all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show events by city", viz=True)

# Direct SQL (no API cost!)
result = sutra.sql("SELECT * FROM employee_story_people WHERE city='Dallas'")
print(result.data)

๐Ÿ“Š How It Works

From Unstructured PDF to Structured Tables

Input: PDF with employee information

AI Automatically Creates:

๐Ÿ“‹ Created 3 structured tables:
  ๐Ÿ“Š employee_story_people: 20 rows, 6 columns
     - id, name, address, city, email, phone
  ๐Ÿ“Š employee_story_contacts: 20 rows, 4 columns
     - id, person_id, email, phone  
  ๐Ÿ“Š employee_story_events: 15 rows, 4 columns
     - id, host_id, description, city

๐Ÿ’ก Usage Examples

1. Upload Different Formats

# CSV file
sutra.upload("sales_data.csv")

# Excel file
sutra.upload("quarterly_report.xlsx")

# PDF document (AI extracts entities!)
sutra.upload("company_directory.pdf")

# Word document
sutra.upload("meeting_notes.docx")

# Text file
sutra.upload("log_data.txt")

# DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob'], 'score': [95, 87]})
sutra.upload(df, name="test_scores")

2. View Your Data

# List all tables with details
sutra.tables()

# Show schema with data types
sutra.schema()

# Show schema for specific table
sutra.schema("employee_story_people")

# Preview data
sutra.peek("employee_story_people", n=10)

3. Query Your Data

# Natural language (uses OpenAI)
result = sutra.ask("What are the top 5 sales by region?")
print(result.data)

# With visualization
result = sutra.ask("Show sales trends by month", viz=True)

# Interactive mode (asks if you want viz)
result = sutra.interactive("Compare revenue across quarters")

# Direct SQL (free, no API!)
result = sutra.sql("SELECT city, COUNT(*) as count FROM employee_story_people GROUP BY city")
print(result.data)

4. Export Your Database

# Export to MySQL (local or cloud)
sutra.save_to_mysql(
    host="localhost",
    user="root",
    password="password",
    database="my_database"
)

# Export to PostgreSQL
sutra.save_to_postgres(
    host="mydb.amazonaws.com",
    user="admin",
    password="password",
    database="production_db"
)

# Export to SQLite file
sutra.export_db("backup.db", format="sqlite")

# Export to SQL dump
sutra.export_db("schema.sql", format="sql")

# Export to JSON
sutra.export_db("data.json", format="json")

# Export to Excel (all tables as sheets)
sutra.export_db("data.xlsx", format="excel")

# Complete backup
sutra.backup("./backups")

๐Ÿ”ฅ Advanced Features

Entity Extraction

QuerySUTRA automatically identifies and extracts:

  • ๐Ÿ‘ฅ People - Names, addresses, contact info
  • ๐Ÿ“ง Contacts - Emails, phone numbers
  • ๐Ÿ“… Events - Meetings, activities, locations
  • ๐Ÿข Organizations - Companies, departments
  • ๐Ÿ“ Locations - Cities, addresses, coordinates

Multiple Table Relationships

# AI creates relational structure
sutra.upload("company_data.pdf")

# Result:
# people table with person_id
# contacts table with foreign key to person_id
# events table with host_id linking to people

Query Across Tables

# Natural language handles joins automatically
result = sutra.ask("Show all events hosted by people from Dallas")

# Or write SQL joins manually
result = sutra.sql("""
    SELECT e.description, p.name, p.city
    FROM employee_story_events e
    JOIN employee_story_people p ON e.host_id = p.id
    WHERE p.city = 'Dallas'
""")

๐Ÿ“ˆ Visualization

# Auto-detect best chart type
result = sutra.ask("Show revenue by product", viz=True)

# Interactive charts with Plotly
# - Bar charts for categorical data
# - Line charts for time series  
# - Tables for detailed data
# - Pie charts for distributions

๐ŸŒ Cloud Database Integration

AWS RDS MySQL

sutra.save_to_mysql(
    host="mydb.xxxx.us-east-1.rds.amazonaws.com",
    user="admin",
    password="password",
    database="production",
    port=3306
)

Google Cloud SQL

sutra.save_to_postgres(
    host="35.123.456.789",
    user="postgres",
    password="password",
    database="analytics"
)

Heroku Postgres

sutra.save_to_postgres(
    host="ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com",
    user="username",
    password="password",
    database="dbname",
    port=5432
)

โšก Performance Tips

# Use direct SQL for complex queries (faster, no API cost)
result = sutra.sql("SELECT * FROM data WHERE status='active'")

# Cache is automatic for repeated questions
result1 = sutra.ask("Show total sales")  # Calls API
result2 = sutra.ask("Show total sales")  # From cache โšก

# Export results for reuse
result.data.to_csv("results.csv")

๐Ÿ”’ API Key Security

# Option 1: Pass directly (not recommended for production)
sutra = SUTRA(api_key="sk-...")

# Option 2: Environment variable (recommended)
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
sutra = SUTRA()

# Option 3: .env file
# Create .env file with: OPENAI_API_KEY=sk-...
from dotenv import load_dotenv
load_dotenv()
sutra = SUTRA()

๐ŸŽ“ Complete Example

from sutra import SUTRA
import pandas as pd

# Initialize
sutra = SUTRA(api_key="your-openai-key")

# Upload PDF - creates multiple tables
sutra.upload("employee_directory.pdf")

# View what was created
tables_info = sutra.tables()
print(f"Created {len(tables_info)} tables")

# View detailed schema
sutra.schema()

# Query specific table
result = sutra.ask("How many people are in each city?", 
                   table="employee_directory_people")
print(result.data)

# Visualize
result = sutra.ask("Show distribution of people by city", viz=True)

# Export to MySQL
sutra.save_to_mysql("localhost", "root", "password", "company_db")

# Backup everything
sutra.backup("./backups")

# Close connection
sutra.close()

๐Ÿ“š Method Reference

Core Methods

Method Description
upload(data, name) Upload any data format, creates multiple tables
tables() List all tables with row/column counts
schema(table) Show detailed schema with data types
peek(table, n) Preview first n rows
ask(question, viz) Natural language query
sql(query, viz) Direct SQL query
interactive(question) Query with viz prompt

Export Methods

Method Description
export_db(path, format) Export database (sqlite/sql/json/excel)
save_to_mysql(...) Save to MySQL database
save_to_postgres(...) Save to PostgreSQL database
backup(path) Complete backup with timestamp

๐Ÿ› Troubleshooting

Q: Only one table created instead of multiple?
A: Make sure you have OpenAI API key set. Without it, falls back to simple parsing.

Q: "No API key" error?
A: Set your OpenAI key: sutra = SUTRA(api_key="sk-...")

Q: PDF extraction failed?
A: Install PyPDF2: pip install PyPDF2

Q: MySQL export error?
A: Install extras: pip install QuerySUTRA[mysql]

๐Ÿ“„ License

MIT License - see LICENSE file

๐Ÿค Contributing

Contributions welcome! Open an issue or submit a PR.

๐Ÿ“ž Support


Made with โค๏ธ by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.3.1.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.3.1-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.3.1.tar.gz.

File metadata

  • Download URL: querysutra-0.3.1.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.1.tar.gz
Algorithm Hash digest
SHA256 0f4feb88539e1e50ba2981a9ec84aa33393f897b9bd026df62f30e1afe6e17e2
MD5 906c0c6a47988ead806b70f8a23ba5df
BLAKE2b-256 da85dd58516f65cd6af200a8c2a0c3bac977ab69ada0b8d2b2984b7d6ac12c84

See more details on using hashes here.

File details

Details for the file querysutra-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e2d95686af38dd2cf8f9f28e329cfe6798d25fa682bc19794c6f154a6a2f8fb
MD5 67249307518983626c06c1b67518ef25
BLAKE2b-256 31a37c55777849b6060294c7b707fcae933a5fac1925447b36e8c4e7bffaa3cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page