Skip to main content

SUTRA: AI-powered data analysis

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

AI-powered data analysis library. Upload PDFs, query with natural language, export to MySQL automatically.

Installation

pip install QuerySUTRA
pip install QuerySUTRA[mysql]  # For MySQL export

Quick Start

from sutra import SUTRA

# Upload PDF and auto-export to MySQL in ONE step
sutra = SUTRA(api_key="your-openai-key")

sutra.upload("data.pdf", auto_export_mysql={
    'host': 'localhost',
    'user': 'root', 
    'password': '123456',
    'database': 'my_database'  # Auto-creates if not exists
})

# Query immediately
result = sutra.ask("Show me all people")
print(result.data)

Features

1. Automatic MySQL Export

Database auto-created if not exists. No errors.

# Upload and export to MySQL automatically
sutra.upload("data.pdf", auto_export_mysql={
    'host': 'localhost',
    'user': 'root',
    'password': 'your_password',
    'database': 'my_new_database'  # Creates automatically
})

2. Complete Data Extraction

Processes entire PDF in chunks. Extracts ALL employees (not just first 10).

sutra.upload("large_document.pdf")  # Extracts all 50+ employees
sutra.tables()  # Shows all extracted tables

3. Natural Language Queries

result = sutra.ask("Show all people from California")
result = sutra.ask("Who has Python skills?", table="skills")
result = sutra.ask("Count employees by state", viz="pie")

4. Custom Visualizations

result = sutra.ask("Sales by region", viz="pie")
result = sutra.ask("Trends", viz="line")
result = sutra.ask("Compare", viz="bar")
result = sutra.ask("Data", viz="scatter")

5. Load Existing Databases

# Load SQLite
sutra = SUTRA.load_from_db("data.db", api_key="key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "pass", "database")

# Connect to PostgreSQL  
sutra = SUTRA.connect_postgres("localhost", "postgres", "pass", "database")

6. Smart Features (Optional)

sutra = SUTRA(
    api_key="your-key",
    use_embeddings=True,    # Cache similar queries (saves API calls)
    fuzzy_match=True,       # "New York City" matches "New York"
    check_relevance=True,   # Detect irrelevant queries
    cache_queries=True      # Cache exact queries
)

7. Direct SQL (Free)

result = sutra.sql("SELECT * FROM people WHERE state='CA'")
print(result.data)

Complete Workflow

In Colab:

from sutra import SUTRA

sutra = SUTRA(api_key="your-key")
sutra.upload("employee_data.pdf")
sutra.tables()  # See extracted tables

# Export and download
sutra.export_db("data.db", format="sqlite")
from google.colab import files
files.download("data.db")

On Windows:

from sutra import SUTRA

# Load downloaded database
sutra = SUTRA.load_from_db("data.db", api_key="your-key")

# Export to MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "password", "my_database")

# Verify in MySQL
sutra_mysql = SUTRA.connect_mysql("localhost", "root", "password", "my_database")
sutra_mysql.tables()

Export Options

# SQLite
sutra.export_db("backup.db", format="sqlite")

# SQL dump
sutra.export_db("schema.sql", format="sql")

# JSON
sutra.export_db("data.json", format="json")

# Excel
sutra.export_db("data.xlsx", format="excel")

# MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "pass", "new_db")

# PostgreSQL
sutra.save_to_postgres("localhost", "postgres", "pass", "new_db")

API Reference

Initialize

SUTRA(api_key, db, use_embeddings, check_relevance, fuzzy_match, cache_queries)

Class Methods

  • load_from_db(path, api_key) - Load SQLite
  • connect_mysql(host, user, password, database) - Connect MySQL
  • connect_postgres(host, user, password, database) - Connect PostgreSQL

Instance Methods

  • upload(data, name, auto_export_mysql) - Upload with optional auto-export
  • ask(question, viz, table) - Natural language query
  • sql(query, viz) - Direct SQL
  • tables() - List tables
  • schema(table) - Show schema
  • peek(table, n) - Preview data
  • export_db(path, format) - Export database
  • save_to_mysql(host, user, password, database) - Export to MySQL (auto-creates DB)
  • save_to_postgres(...) - Export to PostgreSQL
  • backup(path) - Backup
  • close() - Close

Troubleshooting

MySQL database doesn't exist

  • Fixed in v0.4.0 - auto-creates database automatically
  • No need to manually create database

Only 10 employees extracted from 50-employee PDF

  • Fixed in v0.4.0 - processes entire PDF in chunks
  • Upgrade: pip install --upgrade QuerySUTRA

connect_mysql() not found

  • Update: pip install --upgrade QuerySUTRA
  • Install MySQL support: pip install QuerySUTRA[mysql]

Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

Requirements

  • Python 3.8+
  • OpenAI API key
  • MySQL/PostgreSQL (optional)

License

MIT License

Changelog

v0.4.0

  • AUTO-CREATES MySQL database (no more errors)
  • Complete PDF extraction (all pages, all employees)
  • Chunk processing for large documents
  • One-line auto-export to MySQL
  • Simplified everything

v0.3.x

  • MySQL/PostgreSQL connectivity
  • Embeddings caching
  • Fuzzy matching
  • Custom visualizations

Made by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.4.2.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.4.2-py3-none-any.whl (46.4 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.4.2.tar.gz.

File metadata

  • Download URL: querysutra-0.4.2.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.4.2.tar.gz
Algorithm Hash digest
SHA256 cf1200c5230191bae97b48e0ea978e6c5162c25bc3882321eeae8f99a5f1aa09
MD5 9d099b9045e3f5ab8fa42d22ac335e07
BLAKE2b-256 481cb60e37659f2857d2834abbde176ffe7aa7b54e5b5ecc164b03758bfbecaf

See more details on using hashes here.

File details

Details for the file querysutra-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 46.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bac14366def1c108288e7046d9269eec2ec5142189552b68524aa8537d75d363
MD5 79a1d93d0cd3abee4ef10e198e3cf789
BLAKE2b-256 3c5f914c1bbb7ea1c80689369d83a61a932dec15a092a6a44f745b9434273a5b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page