Skip to main content

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture | AI-powered entity extraction from PDF/DOCX/TXT

Project description

🚀 QuerySUTRA v0.1.3

Structured-Unstructured-Text-Retrieval-Architecture

Natural Language to SQL with Cloud Export | PDF, DOCX, TXT Support

A comprehensive Python library that converts natural language questions into SQL queries, with support for multiple file formats and cloud database export.


✨ Key Features

Natural Language to SQL - Ask questions in plain English
Multiple Formats - CSV, Excel, JSON, SQL, PDF, DOCX, TXT, DataFrame
Cloud Export - MySQL, PostgreSQL (local & cloud)
Direct SQL - No API cost option
Auto Visualization - Plotly/Matplotlib charts
Interactive Mode - Ask user for visualization choice
Complete Backup - Export to SQLite, JSON, Excel
Jupyter Ready - Perfect for notebooks


📦 Installation

# Basic installation
pip install QuerySUTRA

# With MySQL support
pip install QuerySUTRA[mysql]

# With PostgreSQL support
pip install QuerySUTRA[postgres]

# With all database support
pip install QuerySUTRA[all]

🎯 Quick Start

from sutra import SUTRA

# Initialize with OpenAI API key
sutra = SUTRA(api_key="your-openai-key")

# Upload any format
sutra.upload("data.csv")      # CSV
sutra.upload("report.pdf")    # PDF ✨
sutra.upload("doc.docx")      # Word ✨
sutra.upload("data.xlsx")     # Excel
sutra.upload(dataframe)       # DataFrame

# Query with natural language
result = sutra.ask("What are the top 5 products?", viz=True)
print(result.data)

# Export to cloud
sutra.save_to_mysql("localhost", "root", "pass", "mydb")
sutra.save_to_postgres("host", "user", "pass", "db")

# Complete backup
sutra.backup()

📄 Supported File Formats

Format Extension Example
CSV .csv sutra.upload("data.csv")
Excel .xlsx, .xls sutra.upload("data.xlsx")
JSON .json sutra.upload("data.json")
SQL .sql sutra.upload("schema.sql")
PDF .pdf sutra.upload("report.pdf")
Word .docx sutra.upload("document.docx")
Text .txt sutra.upload("data.txt")
DataFrame pd.DataFrame sutra.upload(df, name="sales")

🔥 New in v0.1.3

1. PDF Support

# Upload PDF files
sutra.upload("annual_report.pdf")

# Query the content
result = sutra.ask("What are the key findings in this report?")
print(result.data)

2. Word Document Support

# Upload DOCX files with tables
sutra.upload("sales_report.docx")

# Query the data
result = sutra.ask("Show me sales by region", viz=True)

3. Cloud Database Export

MySQL (Local or Cloud)

# Local MySQL
sutra.save_to_mysql("localhost", "root", "password", "mydb")

# AWS RDS MySQL
sutra.save_to_mysql(
    host="mydb.xxxx.us-east-1.rds.amazonaws.com",
    user="admin",
    password="cloudpass",
    database="production"
)

# Google Cloud SQL
sutra.save_to_mysql(
    host="35.123.456.789",
    user="admin",
    password="pass",
    database="mydb"
)

PostgreSQL (Local or Cloud)

# Local PostgreSQL
sutra.save_to_postgres("localhost", "postgres", "password", "mydb")

# Heroku PostgreSQL
sutra.save_to_postgres(
    host="ec2-xxx.compute-1.amazonaws.com",
    user="user",
    password="pass",
    database="dbname"
)

# AWS RDS PostgreSQL
sutra.save_to_postgres(
    host="mydb.xxxx.us-west-2.rds.amazonaws.com",
    user="admin",
    password="pass",
    database="prod"
)

4. Complete Export & Backup

# Export entire database
sutra.export_db("backup.db", format="sqlite")
sutra.export_db("dump.sql", format="sql")
sutra.export_db("data.json", format="json")
sutra.export_db("data.xlsx", format="excel")

# Export schema only
sutra.save_schema("schema.sql", format="sql")
sutra.save_schema("schema.json", format="json")
sutra.save_schema("schema.md", format="markdown")

# Complete backup (creates 3 files)
sutra.backup()  # Creates .db, .sql, .json files with timestamp

📖 Complete Examples

Example 1: PDF Analysis

from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")

# Upload PDF
sutra.upload("financial_report.pdf")

# View extracted data
sutra.peek(n=10)

# Query the content
result = sutra.ask("What are the total revenues?")
print(result.data)

# Visualize
result = sutra.ask("Show revenue by quarter", viz=True)

Example 2: Multi-Format Analysis

sutra = SUTRA(api_key="your-key")

# Upload multiple formats
sutra.upload("sales.csv")
sutra.upload("report.docx")
sutra.upload("data.xlsx")

# List all tables
print(sutra.tables())

# Query across data
result = sutra.ask("What are total sales?")
print(result.data)

Example 3: Cloud Deployment

# Analyze in Colab/Jupyter
sutra = SUTRA(api_key="your-key")
sutra.upload("local_analysis.csv")

# Query and analyze
result = sutra.ask("Show top performers", viz=True)

# Deploy to production MySQL
sutra.save_to_mysql(
    host="production.mysql.com",
    user="admin",
    password="prod_password",
    database="analytics_db"
)

# Backup everything
sutra.backup("/backups")

Example 4: Direct SQL (No API Cost)

# Execute SQL directly - FREE!
result = sutra.sql("""
    SELECT region, 
           SUM(sales) as total_sales,
           AVG(sales) as avg_sales
    FROM sales_data 
    GROUP BY region
    ORDER BY total_sales DESC
""")

print(result.data)

Example 5: Interactive Mode

# Ask user for visualization preference
result = sutra.interactive("What are sales trends?")
# Prompts: "Do you want visualization? (yes/no):"

if result.success:
    print(result.data)

🛠️ API Reference

Initialization

sutra = SUTRA(api_key="your-openai-key", db="sutra.db")

Upload Data

sutra.upload(data, name="table_name")
# data = file path (str) or DataFrame

View Database

sutra.tables()          # List all tables
sutra.schema()          # Show database schema
sutra.peek(n=10)       # Preview data

Query Data

# Direct SQL (no API cost)
result = sutra.sql("SELECT * FROM table", viz=False)

# Natural language (uses API)
result = sutra.ask("question", viz=False)

# Interactive (prompts user)
result = sutra.interactive("question")

Export & Backup

# Export results
sutra.export(dataframe, "output.csv", format="csv")

# Export database
sutra.export_db("backup.db", format="sqlite")

# Save to cloud
sutra.save_to_mysql(host, user, password, database)
sutra.save_to_postgres(host, user, password, database)

# Complete backup
sutra.backup("/backup/path")

QueryResult Object

result.success   # bool - query succeeded
result.sql       # str - generated SQL
result.data      # DataFrame - query results
result.viz       # figure - visualization (if viz=True)
result.error     # str - error message (if failed)

💡 Use Cases

Data Analysis

sutra.upload("sales_data.csv")
result = sutra.ask("What products have declining sales?", viz=True)

Document Processing

sutra.upload("contract.pdf")
result = sutra.ask("What are the key terms and dates?")

Multi-Source Integration

sutra.upload("sales.csv")
sutra.upload("inventory.xlsx")
sutra.upload("report.docx")
result = sutra.ask("Combine all data sources")

Cloud Migration

# Local analysis
sutra.upload("data.csv")
result = sutra.ask("Analyze trends")

# Deploy to cloud
sutra.save_to_postgres("cloud-db.com", "user", "pass", "prod")

🎨 Features Comparison

Feature Available Cost
CSV/Excel/JSON Upload Free
PDF Upload Free
DOCX Upload Free
Direct SQL Queries Free
Natural Language Queries ~$0.001/query
Visualization Free
MySQL Export Free
PostgreSQL Export Free
Backup & Export Free

💰 Cost Optimization

# FREE - Direct SQL (no API calls)
result = sutra.sql("SELECT * FROM data WHERE sales > 1000")

# PAID - Natural language (uses OpenAI API)
result = sutra.ask("Show products with sales over 1000")

# Tip: Use direct SQL when you know the query!

🧪 Testing

# Install
pip install QuerySUTRA

# Test
python -c "from sutra import SUTRA; print('✅ Success!')"

📚 Documentation

  • Full Guide: See SUTRA_Complete_Guide.ipynb
  • Publishing: See PUBLISHING_GUIDE.md
  • Examples: See complete_example.py

🤝 Contributing

Contributions welcome! The main code is in sutra/sutra.py - a single, well-documented file.


📄 License

MIT License - Free to use in your projects!


🏆 Why QuerySUTRA?

  • SUTRA = Structured-Unstructured-Text-Retrieval-Architecture
  • Single-file design for simplicity
  • Production-ready with error handling
  • Cloud-native with export capabilities
  • Comprehensive format support (PDF, DOCX, CSV, Excel, JSON)
  • Cost-effective with free SQL mode

🌟 Credits

Author: Aditya Batta
Version: 0.1.3
License: MIT


📞 Support


Made with ❤️ for data analysts and developers

Start analyzing with natural language today! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.2.0.tar.gz (46.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.2.0-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.2.0.tar.gz.

File metadata

  • Download URL: querysutra-0.2.0.tar.gz
  • Upload date:
  • Size: 46.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0088bf75e4b588257ad4693d8bee2683c776d27d005470d20e5e465c3b4dedaf
MD5 3378f432fc3ec2ae436c43fede300b16
BLAKE2b-256 dc2d40aa5b23229cd33545326c095ef1552d861b34127263c12db65f0283dba9

See more details on using hashes here.

File details

Details for the file querysutra-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37a4d767b6adaf208021a743f8e2fb63fa4a3dd38ca8074a0feda3a7f2f3012d
MD5 9fc409b79e7baef6c10424677f34df5d
BLAKE2b-256 955799791942c97402a711e41bc5e43beb0726c64956d6a79c41607b52bb02ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page