Skip to main content

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.

Installation

pip install QuerySUTRA

# Optional features
pip install QuerySUTRA[embeddings]  # Smart caching
pip install QuerySUTRA[mysql]       # MySQL support
pip install QuerySUTRA[postgres]    # PostgreSQL support
pip install QuerySUTRA[all]         # All features

Key Features

1. Automatic Multi-Table Creation

Upload PDFs, Word documents, or text files and automatically extract structured entities.

from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")
sutra.upload("employee_data.pdf")

# Automatically creates:
# - employee_data_people (20 rows, 6 columns)
# - employee_data_contacts (20 rows, 4 columns)
# - employee_data_events (15 rows, 4 columns)

2. Natural Language Querying

result = sutra.ask("Show me all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show sales by region", viz="pie")

3. Load Existing Databases

# Load SQLite database
sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")

# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")

4. Custom Visualizations

result = sutra.ask("Sales by region", viz="pie")       # Pie chart
result = sutra.ask("Trends", viz="line")               # Line chart
result = sutra.ask("Compare", viz="bar")               # Bar chart
result = sutra.ask("Correlation", viz="scatter")       # Scatter plot
result = sutra.ask("Data", viz="table")                # Table view
result = sutra.ask("Analysis", viz="heatmap")          # Heatmap
result = sutra.ask("Auto", viz=True)                   # Auto-detect

5. Smart Fuzzy Matching

sutra = SUTRA(api_key="your-key", fuzzy_match=True)

# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")

6. Intelligent Caching with Embeddings

sutra = SUTRA(api_key="your-key", use_embeddings=True)

result = sutra.ask("Show sales")           # Calls API
result = sutra.ask("Display sales data")   # Uses cache (no API call)

7. Irrelevant Query Detection

sutra = SUTRA(api_key="your-key", check_relevance=True)

result = sutra.ask("What is the weather?")
# Warns: "This question seems irrelevant to your database"

8. Direct SQL Access (Free)

result = sutra.sql("SELECT * FROM people WHERE city='New York'")
print(result.data)

Complete Configuration

sutra = SUTRA(
    api_key="your-openai-key",
    db="database.db",              # SQLite path
    use_embeddings=True,           # Smart caching (saves API calls)
    check_relevance=True,          # Detect irrelevant queries
    fuzzy_match=True,              # Better NLP
    cache_queries=True             # Simple caching
)

Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

Usage Examples

Basic Workflow

sutra = SUTRA(api_key="your-key")
sutra.upload("data.pdf")
sutra.tables()                    # View tables
sutra.schema()                    # View schema
sutra.peek("table_name", n=10)    # Preview data
result = sutra.ask("Your question?")

Database Export

sutra.export_db("backup.db", format="sqlite")
sutra.export_db("schema.sql", format="sql")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
sutra.backup("./backups")

How It Works

Entity Extraction Example

Input PDF:

John Doe lives at 123 Main St, Dallas. Email: john@company.com.
Sarah Smith lives at 456 Oak Ave, Boston. Email: sarah@company.com.

Output Tables:

people

id name address city email
1 John Doe 123 Main St Dallas john@company.com
2 Sarah Smith 456 Oak Ave Boston sarah@company.com

Embeddings for Smart Caching

Uses all-MiniLM-L6-v2 model (80MB, runs locally):

  • Query 1: "Show sales" → API call
  • Query 2: "Display sales" → 92% similar → Cached (no API call)

Fuzzy Matching

  • Query: "New York City"
  • Database: ["New York", "Dallas", "Boston"]
  • Match: "New York City" → "New York" (85% similar)

API Reference

Class Methods

SUTRA.load_from_db(db_path, api_key, **kwargs) - Load existing SQLite database

SUTRA.connect_mysql(host, user, password, database, ...) - Connect to MySQL

SUTRA.connect_postgres(host, user, password, database, ...) - Connect to PostgreSQL

Instance Methods

upload(data, name=None) - Upload data

ask(question, viz=False, table=None) - Natural language query

sql(query, viz=False) - Raw SQL query

tables() - List all tables

schema(table=None) - Show schema

peek(table=None, n=5) - Preview data

export_db(path, format) - Export database

save_to_mysql(...) - Export to MySQL

save_to_postgres(...) - Export to PostgreSQL

backup(path=None) - Create backup

close() - Close connection

Performance Tips

  1. Use load_from_db() to avoid re-uploading
  2. Use sql() for complex queries (no API cost)
  3. Enable use_embeddings=True for caching
  4. Enable cache_queries=True for exact matches

Troubleshooting

No API key error: sutra = SUTRA(api_key="sk-...")

PDF fails: pip install PyPDF2

MySQL error: pip install QuerySUTRA[mysql]

Embeddings error: pip install QuerySUTRA[embeddings]

Requirements

  • Python 3.8+
  • OpenAI API key
  • 100MB disk space (if using embeddings)

License

MIT License

Changelog

v0.3.1

  • Semantic embeddings for smart caching
  • Fuzzy matching for better NLP
  • Irrelevant query detection
  • Load existing databases
  • MySQL/PostgreSQL connectivity
  • Custom visualizations
  • All features optional

Made by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.3.3.tar.gz (45.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.3.3-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.3.3.tar.gz.

File metadata

  • Download URL: querysutra-0.3.3.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.3.tar.gz
Algorithm Hash digest
SHA256 04d756d088e0e4e9949a00c9067a50fd3fa0ed91a9f530a4c2d5fe01ef7a3518
MD5 6f01e8f1310e8edf69b7ed2c266a41d0
BLAKE2b-256 b4cbcf70714c5610426ca65847ceea00a38267340fbe34a354ff81f6ce7759f3

See more details on using hashes here.

File details

Details for the file querysutra-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ff282024f600376d241f4345ea3fa3274c90602ea7e56ffcb2ec43fad937369d
MD5 569354eaf8bcdbfec9129fe70750b984
BLAKE2b-256 c488f25a4c1737914e16740a5e276d6b1c6eee6c7de20c7f1777f0e98bb4252f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page