Skip to main content

SUTRA: AI-powered data analysis with automatic MySQL export

Project description

QuerySUTRA

SUTRA: Structured-Unstructured-Text-Retrieval-Architecture

AI-powered data analysis library. Upload PDFs, query with natural language, export to MySQL automatically.

Installation

pip install QuerySUTRA
pip install QuerySUTRA[mysql]  # For MySQL export

Quick Start

from sutra import SUTRA

# Upload PDF and auto-export to MySQL in ONE step
sutra = SUTRA(api_key="your-openai-key")

sutra.upload("data.pdf", auto_export_mysql={
    'host': 'localhost',
    'user': 'root', 
    'password': '123456',
    'database': 'my_database'  # Auto-creates if not exists
})

# Query immediately
result = sutra.ask("Show me all people")
print(result.data)

Features

1. Automatic MySQL Export

Database auto-created if not exists. No errors.

# Upload and export to MySQL automatically
sutra.upload("data.pdf", auto_export_mysql={
    'host': 'localhost',
    'user': 'root',
    'password': 'your_password',
    'database': 'my_new_database'  # Creates automatically
})

2. Complete Data Extraction

Processes entire PDF in chunks. Extracts ALL employees (not just first 10).

sutra.upload("large_document.pdf")  # Extracts all 50+ employees
sutra.tables()  # Shows all extracted tables

3. Natural Language Queries

result = sutra.ask("Show all people from California")
result = sutra.ask("Who has Python skills?", table="skills")
result = sutra.ask("Count employees by state", viz="pie")

4. Custom Visualizations

result = sutra.ask("Sales by region", viz="pie")
result = sutra.ask("Trends", viz="line")
result = sutra.ask("Compare", viz="bar")
result = sutra.ask("Data", viz="scatter")

5. Load Existing Databases

# Load SQLite
sutra = SUTRA.load_from_db("data.db", api_key="key")

# Connect to MySQL
sutra = SUTRA.connect_mysql("localhost", "root", "pass", "database")

# Connect to PostgreSQL  
sutra = SUTRA.connect_postgres("localhost", "postgres", "pass", "database")

6. Smart Features (Optional)

sutra = SUTRA(
    api_key="your-key",
    use_embeddings=True,    # Cache similar queries (saves API calls)
    fuzzy_match=True,       # "New York City" matches "New York"
    check_relevance=True,   # Detect irrelevant queries
    cache_queries=True      # Cache exact queries
)

7. Direct SQL (Free)

result = sutra.sql("SELECT * FROM people WHERE state='CA'")
print(result.data)

Complete Workflow

In Colab:

from sutra import SUTRA

sutra = SUTRA(api_key="your-key")
sutra.upload("employee_data.pdf")
sutra.tables()  # See extracted tables

# Export and download
sutra.export_db("data.db", format="sqlite")
from google.colab import files
files.download("data.db")

On Windows:

from sutra import SUTRA

# Load downloaded database
sutra = SUTRA.load_from_db("data.db", api_key="your-key")

# Export to MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "password", "my_database")

# Verify in MySQL
sutra_mysql = SUTRA.connect_mysql("localhost", "root", "password", "my_database")
sutra_mysql.tables()

Export Options

# SQLite
sutra.export_db("backup.db", format="sqlite")

# SQL dump
sutra.export_db("schema.sql", format="sql")

# JSON
sutra.export_db("data.json", format="json")

# Excel
sutra.export_db("data.xlsx", format="excel")

# MySQL (auto-creates database)
sutra.save_to_mysql("localhost", "root", "pass", "new_db")

# PostgreSQL
sutra.save_to_postgres("localhost", "postgres", "pass", "new_db")

API Reference

Initialize

SUTRA(api_key, db, use_embeddings, check_relevance, fuzzy_match, cache_queries)

Class Methods

  • load_from_db(path, api_key) - Load SQLite
  • connect_mysql(host, user, password, database) - Connect MySQL
  • connect_postgres(host, user, password, database) - Connect PostgreSQL

Instance Methods

  • upload(data, name, auto_export_mysql) - Upload with optional auto-export
  • ask(question, viz, table) - Natural language query
  • sql(query, viz) - Direct SQL
  • tables() - List tables
  • schema(table) - Show schema
  • peek(table, n) - Preview data
  • export_db(path, format) - Export database
  • save_to_mysql(host, user, password, database) - Export to MySQL (auto-creates DB)
  • save_to_postgres(...) - Export to PostgreSQL
  • backup(path) - Backup
  • close() - Close

Troubleshooting

MySQL database doesn't exist

  • Fixed in v0.4.0 - auto-creates database automatically
  • No need to manually create database

Only 10 employees extracted from 50-employee PDF

  • Fixed in v0.4.0 - processes entire PDF in chunks
  • Upgrade: pip install --upgrade QuerySUTRA

connect_mysql() not found

  • Update: pip install --upgrade QuerySUTRA
  • Install MySQL support: pip install QuerySUTRA[mysql]

Supported Formats

CSV, Excel, JSON, SQL, PDF, Word, Text, Pandas DataFrame

Requirements

  • Python 3.8+
  • OpenAI API key
  • MySQL/PostgreSQL (optional)

License

MIT License

Changelog

v0.4.0

  • AUTO-CREATES MySQL database (no more errors)
  • Complete PDF extraction (all pages, all employees)
  • Chunk processing for large documents
  • One-line auto-export to MySQL
  • Simplified everything

v0.3.x

  • MySQL/PostgreSQL connectivity
  • Embeddings caching
  • Fuzzy matching
  • Custom visualizations

Made by Aditya Batta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querysutra-0.4.1.tar.gz (42.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querysutra-0.4.1-py3-none-any.whl (46.4 kB view details)

Uploaded Python 3

File details

Details for the file querysutra-0.4.1.tar.gz.

File metadata

  • Download URL: querysutra-0.4.1.tar.gz
  • Upload date:
  • Size: 42.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.4.1.tar.gz
Algorithm Hash digest
SHA256 3213f7207e88066dd4cc4d15199cd4501d09551f3d6e50653b7defaf003c24d3
MD5 8152133cbf957721139446edb4ecb0b2
BLAKE2b-256 622974a7410d01d59086dd17574743905f950d47ed93846970fdbc1246e3636a

See more details on using hashes here.

File details

Details for the file querysutra-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: querysutra-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 46.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for querysutra-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d9da98782a157645ce42af2a313f360d7565fa7c0d208c629ef2c7ca95bdf08
MD5 cfaf8c8d9dd5fa1fdddc47f1dbc0c302
BLAKE2b-256 4d3b5e57d928dad31a45a2cca0fe3b799be3a0382373c9f91658fda6195ce454

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page