Skip to main content

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture - AI-powered data analysis with custom visualizations, fuzzy matching, and smart caching

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

Professional Python library for AI-powered data analysis with automatic entity extraction, natural language querying, and intelligent caching.

Installation

pip install QuerySUTRA

# Optional features
pip install QuerySUTRA[embeddings]  # Smart caching
pip install QuerySUTRA[mysql]       # MySQL support
pip install QuerySUTRA[postgres]    # PostgreSQL support
pip install QuerySUTRA[all]         # All features

Key Features

1. Automatic Multi-Table Creation

Upload PDFs, Word documents, or text files and automatically extract structured entities.

from sutra import SUTRA

sutra = SUTRA(api_key="your-openai-key")
sutra.upload("employee_data.pdf")

# Automatically creates:
# - employee_data_people (20 rows, 6 columns)
# - employee_data_contacts (20 rows, 4 columns)
# - employee_data_events (15 rows, 4 columns)

2. Natural Language Querying

result = sutra.ask("Show me all people from New York")
print(result.data)

# With visualization
result = sutra.ask("Show sales by region", viz="pie")

3. Load Existing Databases

# Load SQLite database
sutra = SUTRA.load_from_db("sutra.db", api_key="your-key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "password", "database")

# Connect to PostgreSQL
sutra = SUTRA.connect_postgres("localhost", "postgres", "password", "database")

4. Custom Visualizations

result = sutra.ask("Sales by region", viz="pie")       # Pie chart
result = sutra.ask("Trends", viz="line")               # Line chart
result = sutra.ask("Compare", viz="bar")               # Bar chart
result = sutra.ask("Correlation", viz="scatter")       # Scatter plot
result = sutra.ask("Data", viz="table")                # Table view
result = sutra.ask("Analysis", viz="heatmap")          # Heatmap
result = sutra.ask("Auto", viz=True)                   # Auto-detect

5. Smart Fuzzy Matching

sutra = SUTRA(api_key="your-key", fuzzy_match=True)

# "New York City" matches "New York" automatically
result = sutra.ask("Who are from New York City?")

6. Intelligent Caching with Embeddings

sutra = SUTRA(api_key="your-key", use_embeddings=True)

result = sutra.ask("Show sales")           # Calls API
result = sutra.ask("Display sales data")   # Uses cache (no API call)

7. Irrelevant Query Detection

sutra = SUTRA(api_key="your-key", check_relevance=True)

result = sutra.ask("What is the weather?")
# Warns: "This question seems irrelevant to your database"

8. Direct SQL Access (Free)

result = sutra.sql("SELECT * FROM people WHERE city='New York'")
print(result.data)

Complete Configuration

sutra = SUTRA(
    api_key="your-openai-key",
    db="database.db",              # SQLite path
    use_embeddings=True,           # Smart caching (saves API calls)
    check_relevance=True,          # Detect irrelevant queries
    fuzzy_match=True,              # Better NLP
    cache_queries=True             # Simple caching
)

Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

Usage Examples

Basic Workflow

sutra = SUTRA(api_key="your-key")
sutra.upload("data.pdf")
sutra.tables()                    # View tables
sutra.schema()                    # View schema
sutra.peek("table_name", n=10)    # Preview data
result = sutra.ask("Your question?")

Database Export

sutra.export_db("backup.db", format="sqlite")
sutra.export_db("schema.sql", format="sql")
sutra.save_to_mysql("localhost", "root", "pass", "db")
sutra.save_to_postgres("localhost", "postgres", "pass", "db")
sutra.backup("./backups")

How It Works

Entity Extraction Example

Input PDF:

John Doe lives at 123 Main St, Dallas. Email: john@company.com.
Sarah Smith lives at 456 Oak Ave, Boston. Email: sarah@company.com.

Output Tables:

people

id name address city email
1 John Doe 123 Main St Dallas john@company.com
2 Sarah Smith 456 Oak Ave Boston sarah@company.com

Embeddings for Smart Caching

Uses all-MiniLM-L6-v2 model (80MB, runs locally):

  • Query 1: "Show sales" → API call
  • Query 2: "Display sales" → 92% similar → Cached (no API call)

Fuzzy Matching

  • Query: "New York City"
  • Database: ["New York", "Dallas", "Boston"]
  • Match: "New York City" → "New York" (85% similar)

API Reference

Class Methods

SUTRA.load_from_db(db_path, api_key, **kwargs) - Load existing SQLite database

SUTRA.connect_mysql(host, user, password, database, ...) - Connect to MySQL

SUTRA.connect_postgres(host, user, password, database, ...) - Connect to PostgreSQL

Instance Methods

upload(data, name=None) - Upload data

ask(question, viz=False, table=None) - Natural language query

sql(query, viz=False) - Raw SQL query

tables() - List all tables

schema(table=None) - Show schema

peek(table=None, n=5) - Preview data

export_db(path, format) - Export database

save_to_mysql(...) - Export to MySQL

save_to_postgres(...) - Export to PostgreSQL

backup(path=None) - Create backup

close() - Close connection

Performance Tips

  1. Use load_from_db() to avoid re-uploading
  2. Use sql() for complex queries (no API cost)
  3. Enable use_embeddings=True for caching
  4. Enable cache_queries=True for exact matches

Troubleshooting

No API key error: sutra = SUTRA(api_key="sk-...")

PDF fails: pip install PyPDF2

MySQL error: pip install QuerySUTRA[mysql]

Embeddings error: pip install QuerySUTRA[embeddings]

Requirements

  • Python 3.8+
  • OpenAI API key
  • 100MB disk space (if using embeddings)

License

MIT License

Changelog

v0.3.1

  • Semantic embeddings for smart caching
  • Fuzzy matching for better NLP
  • Irrelevant query detection
  • Load existing databases
  • MySQL/PostgreSQL connectivity
  • Custom visualizations
  • All features optional

Made by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.3.2.tar.gz (49.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.3.2-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.3.2.tar.gz.

File metadata

  • Download URL: querysutra-0.3.2.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.2.tar.gz
Algorithm Hash digest
SHA256 722d991934e40d42bfde34adcbe6622909e3c253a7f9bcb4a1bb9a201cbb7fcd
MD5 c0e268afeec7f3e8125282c7bf469e0e
BLAKE2b-256 73ec2f3c5503c5765d2d658d7d5054192556549c2e4b3a6581e4fbda02c02dc1

See more details on using hashes here.

File details

Details for the file querysutra-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3174d4508c9f7886f1bdfeec2e8cee445b655d451d4531f1af575a5a7f1ad72b
MD5 6d687ba2c50d4e3d62fed77907c88ad5
BLAKE2b-256 57e31e5d8c6b8336a7294a2bfe0426c17f6d3fc4bc7440bf831f86e25b36721d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page