Skip to main content

A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.

Project description

Thoth Database Manager

A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.

Features

  • Multi-Database Support: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
  • Plugin Architecture: Extensible design for adding new database types
  • LSH Search: Locality-Sensitive Hashing for finding similar values across database columns
  • Type Safety: Pydantic-based document models for structured data
  • Modern API: Clean, plugin-based interface for all database operations

Installation

Basic Installation

uv add thoth-dbmanager

Installation with Database Support

To use specific databases, you'll need to install the corresponding database drivers. You can install them individually or use our convenience extras:

Install with All Database Support

uv add thoth-dbmanager[all]

Install with Specific Database Support

PostgreSQL:

uv add thoth-dbmanager[postgresql]
# or manually: uv add psycopg2-binary

MySQL:

uv add thoth-dbmanager[mysql]
# or manually: uv add mysql-connector-python

MariaDB:

uv add thoth-dbmanager[mariadb]
# or manually: uv add mariadb

SQL Server:

uv add thoth-dbmanager[sqlserver]
# or manually: uv add pyodbc

Oracle:

uv add thoth-dbmanager[oracle]
# or manually: uv add cx_Oracle

Informix Support

ThothDBManager provides native support for IBM Informix databases using a pure Python SSH-based approach. This implementation requires no ODBC drivers or native dependencies, making it ideal for containerized environments.

Key Features:

  • ✅ SSH-based connectivity (no ODBC required)
  • ✅ Zero native dependencies
  • ✅ Identical implementation across all platforms
  • ✅ Full metadata support (tables, columns, indexes, foreign keys)
  • ✅ Example data extraction

Requirements:

  • Python 3.8+
  • paramiko package
  • dbaccess command-line tool on the remote Informix server

Installation:

pip install thoth-dbmanager[informix]

Configuration:

manager = ThothDbManager.get_instance(
    db_type="informix",
    db_root_path="./data",
    db_mode="production",
    
    # SSH connection
    ssh_host="informix.server.com",
    ssh_username="sshuser",
    ssh_private_key_path="/path/to/private/key",
    ssh_private_key_passphrase="keypass",  # Optional
    
    # Database connection
    database="mydb",
    server="informix_server",  # INFORMIXSERVER value
    user="dbuser",
    password="dbpass",
    
    # Optional: SSH settings
    ssh_port=22,  # Default
    ssh_auth_method="private_key",  # or "password", "password_and_key"
)

Supabase:

uv add thoth-dbmanager[supabase]
# or manually: uv add supabase gotrue

SQLite is supported out of the box (no additional drivers needed).

Development Installation

uv add thoth-dbmanager[dev]

Quick Start

Basic Usage

from thoth_dbmanager import ThothDbManager

# Create a database manager instance
manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# Get database information
tables = manager.get_tables()
columns = manager.get_columns("my_table")
foreign_keys = manager.get_foreign_keys()

# Execute SQL queries
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")

# Get example data for understanding table contents
example_data = manager.get_example_data("my_table", number_of_rows=20)

LSH Similarity Search

# Query for similar values across all database columns
similar_values = manager.query_lsh(
    keyword="john doe",
    top_n=10,
    signature_size=30,
    n_gram=3
)

# Results are organized by table and column
for table_name, columns in similar_values.items():
    for column_name, values in columns.items():
        print(f"Similar values in {table_name}.{column_name}: {values}")

Supported Databases

Database Connection Parameters
PostgreSQL host, port, database, user, password
MySQL host, port, database, user, password
MariaDB host, port, database, user, password
SQLite database_path
Supabase host, port, database, user, password, project_url, api_key
SQL Server server, database, user, password
Oracle host, port, service_name, user, password
Informix ssh_host, ssh_username, ssh_key_file, database, server, informixdir

API Reference

ThothDbManager

The main class for database operations.

Class Methods

get_instance(db_type: str, **kwargs) -> ThothDbManager

Creates or retrieves a singleton instance of the database manager.

Parameters:

  • db_type (str): Database type identifier
  • db_root_path (str): Path to store database-related files
  • db_mode (str): Operating mode ("dev", "prod", etc.)
  • **kwargs: Database-specific connection parameters

Returns: Database manager instance

Instance Methods

execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any

Execute SQL queries against the database.

Parameters:

  • sql (str): SQL query string
  • params (dict, optional): Query parameters for prepared statements
  • fetch (str): How to fetch results ("all", "one", or number)
  • timeout (int): Query timeout in seconds

Returns: Query results

get_tables() -> List[Dict[str, str]]

Get list of tables in the database.

Returns: List of dictionaries with name and comment keys

get_columns(table_name: str) -> List[Dict[str, Any]]

Get column information for a specific table.

Parameters:

  • table_name (str): Name of the table

Returns: List of dictionaries with column metadata (name, data_type, comment, is_pk)

get_foreign_keys() -> List[Dict[str, str]]

Get foreign key relationships in the database.

Returns: List of dictionaries with foreign key information

get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]

Get the most frequent values for each column in a table.

Parameters:

  • table_name (str): Name of the table
  • number_of_rows (int): Maximum number of example values per column

Returns: Dictionary mapping column names to lists of example values

query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]

Search for similar values using LSH (Locality-Sensitive Hashing).

Parameters:

  • keyword (str): Search term
  • signature_size (int): MinHash signature size
  • n_gram (int): N-gram size for text processing
  • top_n (int): Number of similar values to return

Returns: Nested dictionary: {table_name: {column_name: [similar_values]}}

Factory Pattern (Alternative API)

For more advanced usage, you can use the factory pattern:

from thoth_dbmanager import ThothDbFactory

# Create manager using factory
manager = ThothDbFactory.create_manager(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# List available database types
available_dbs = ThothDbFactory.list_available_databases()

# Get required parameters for a database type
params = ThothDbFactory.get_required_parameters("postgresql")

Document-Based API (Advanced)

The library also provides a document-based API for structured data access:

# Get tables as structured documents
if hasattr(manager, 'get_tables_as_documents'):
    table_docs = manager.get_tables_as_documents()
    for doc in table_docs:
        print(f"Table: {doc.table_name}")
        print(f"Schema: {doc.schema_name}")
        print(f"Comment: {doc.comment}")

# Get columns as structured documents
if hasattr(manager, 'get_columns_as_documents'):
    column_docs = manager.get_columns_as_documents("my_table")
    for doc in column_docs:
        print(f"Column: {doc.column_name} ({doc.data_type})")
        print(f"Nullable: {doc.is_nullable}")
        print(f"Primary Key: {doc.is_pk}")

Database Connection Examples

PostgreSQL

Installation:

uv add thoth-dbmanager[postgresql]

Usage:

manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=5432,
    database="myapp",
    user="dbuser",
    password="dbpass"
)

MySQL

Installation:

uv add thoth-dbmanager[mysql]

Usage:

manager = ThothDbManager.get_instance(
    db_type="mysql",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=3306,
    database="myapp",
    user="dbuser",
    password="dbpass"
)

MariaDB

Installation:

uv add thoth-dbmanager[mariadb]

Usage:

manager = ThothDbManager.get_instance(
    db_type="mariadb",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=3306,
    database="myapp",
    user="dbuser",
    password="dbpass"
)

SQLite

Installation:

# No additional drivers needed - SQLite is included with Python
uv add thoth-dbmanager

Usage:

manager = ThothDbManager.get_instance(
    db_type="sqlite",
    db_root_path="./data",
    db_mode="dev",
    database_path="./data/myapp.db"
)

SQL Server

Installation:

uv add thoth-dbmanager[sqlserver]

Usage:

manager = ThothDbManager.get_instance(
    db_type="sqlserver",
    db_root_path="./data",
    db_mode="production",
    server="localhost",
    database="myapp",
    user="dbuser",
    password="dbpass"
)

Oracle

Installation:

uv add thoth-dbmanager[oracle]

Usage:

manager = ThothDbManager.get_instance(
    db_type="oracle",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=1521,
    service_name="ORCL",
    user="dbuser",
    password="dbpass"
)

Informix

Installation:

uv add thoth-dbmanager[informix]
# Only requires paramiko - NO ODBC drivers needed!

Usage (SSH + dbaccess):

manager = ThothDbManager.get_instance(
    db_type="informix",
    db_root_path="./data",
    db_mode="production",
    # SSH connection
    ssh_host="informix.server.com",
    ssh_username="sshuser",
    ssh_private_key_path="/path/to/private/key",
    ssh_private_key_passphrase="keypass",  # Optional
    # Informix configuration
    database="myapp",
    server="informix_server",  # INFORMIXSERVER name
    informixdir="/u/appl/ids10",  # INFORMIXDIR path on remote server
    user="dbuser",
    password="dbpass"
)

Docker-Friendly: The SSH + dbaccess approach works identically in development and production (Docker):

  • ✅ Zero native dependencies (only paramiko)
  • ✅ Works on all platforms (macOS, Linux, Windows, ARM64)
  • ✅ Lightweight Docker images (200 MB vs 1+ GB with ODBC)
  • ✅ Same code for dev and prod

Documentation:

Features:

  • Full metadata retrieval (tables, columns, indexes, foreign keys)
  • SSH + dbaccess (no ODBC required)
  • Zero native dependencies
  • Informix-specific SQL syntax support (FIRST, SKIP, etc.)
  • System catalog queries for database analysis
  • Identical implementation for dev and production

Supabase

Installation:

uv add thoth-dbmanager[supabase]

Usage:

manager = ThothDbManager.get_instance(
    db_type="supabase",
    db_root_path="./data",
    db_mode="production",
    host="db.example.supabase.co",
    port=5432,
    database="postgres",
    user="postgres",
    password="your_password",
    project_url="https://your-project.supabase.co",
    api_key="your_supabase_api_key"
)

Error Handling

The library provides clear error messages for common issues:

try:
    manager = ThothDbManager.get_instance(
        db_type="postgresql",
        db_root_path="./data",
        # Missing required parameters
    )
except ValueError as e:
    print(f"Configuration error: {e}")

try:
    results = manager.execute_sql("SELECT * FROM nonexistent_table")
except Exception as e:
    print(f"Query error: {e}")

LSH Search Details

The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:

  1. Automatic Setup: LSH indexes are created automatically from your database content
  2. Cross-Column Search: Search across all text columns in all tables
  3. Fuzzy Matching: Find similar values even with typos or variations
  4. Configurable: Adjust similarity sensitivity with parameters

LSH Use Cases

  • Data Deduplication: Find duplicate or near-duplicate records
  • Data Quality: Identify inconsistent data entry
  • Search Enhancement: Provide "did you mean?" functionality
  • Data Exploration: Discover related content across tables

Architecture

The library uses a modern plugin architecture:

  • Plugins: Database-specific implementations
  • Adapters: Low-level database operations
  • Factory: Plugin instantiation and management
  • Documents: Type-safe data models using Pydantic
  • Registry: Plugin discovery and registration

This design makes it easy to:

  • Add support for new database types
  • Maintain consistent APIs across databases
  • Extend functionality without breaking existing code

Contributing

To add support for a new database type:

  1. Create an adapter class implementing DbAdapter
  2. Create a plugin class implementing DbPlugin
  3. Register the plugin with @register_plugin("db_type")
  4. Add connection parameter validation
  5. Implement required abstract methods

License

This project is licensed under the MIT License.

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_dbmanager-0.7.5.tar.gz (70.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_dbmanager-0.7.5-py3-none-any.whl (92.3 kB view details)

Uploaded Python 3

File details

Details for the file thoth_dbmanager-0.7.5.tar.gz.

File metadata

  • Download URL: thoth_dbmanager-0.7.5.tar.gz
  • Upload date:
  • Size: 70.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for thoth_dbmanager-0.7.5.tar.gz
Algorithm Hash digest
SHA256 cc913bf2291bbf649792e1dc228dd69e61d2d10a9563e816fb29d8355a51ffcf
MD5 2766c767d2e3f4bea74e9eb9a64e0137
BLAKE2b-256 49ebc21843663a2038b67b1d8f967b4a7f6f940a91fa31729334cf38203069e2

See more details on using hashes here.

File details

Details for the file thoth_dbmanager-0.7.5-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_dbmanager-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 38ddee5ae3562cfe6deb19d021011fdccc10805bf9c050811443938add775bd9
MD5 ac35b0fa5ab010df41aea3c03f75a35e
BLAKE2b-256 ad3a6c439c6dbbfc225a4d73f5d08cb0f9e4982cb598410d75d5d037be051271

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page