Skip to main content

A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.

Project description

Thoth Database Manager

A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.

Features

  • Multi-Database Support: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
  • Plugin Architecture: Extensible design for adding new database types
  • LSH Search: Locality-Sensitive Hashing for finding similar values across database columns
  • Type Safety: Pydantic-based document models for structured data
  • Backward Compatibility: Maintains compatibility with existing code

Installation

pip install thoth-dbmanager

Quick Start

Basic Usage

from thoth_dbmanager import ThothDbManager

# Create a database manager instance
manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# Get database information
tables = manager.get_tables()
columns = manager.get_columns("my_table")
foreign_keys = manager.get_foreign_keys()

# Execute SQL queries
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")

# Get example data for understanding table contents
example_data = manager.get_example_data("my_table", number_of_rows=20)

LSH Similarity Search

# Query for similar values across all database columns
similar_values = manager.query_lsh(
    keyword="john doe",
    top_n=10,
    signature_size=30,
    n_gram=3
)

# Results are organized by table and column
for table_name, columns in similar_values.items():
    for column_name, values in columns.items():
        print(f"Similar values in {table_name}.{column_name}: {values}")

Supported Databases

Database Connection Parameters
PostgreSQL host, port, database, user, password
MySQL host, port, database, user, password
MariaDB host, port, database, user, password
SQLite database_path
Supabase host, port, database, user, password, project_url, api_key
SQL Server server, database, user, password
Oracle host, port, service_name, user, password
Informix server, database, host, user, password

API Reference

ThothDbManager

The main class for database operations.

Class Methods

get_instance(db_type: str, **kwargs) -> ThothDbManager

Creates or retrieves a singleton instance of the database manager.

Parameters:

  • db_type (str): Database type identifier
  • db_root_path (str): Path to store database-related files
  • db_mode (str): Operating mode ("dev", "prod", etc.)
  • **kwargs: Database-specific connection parameters

Returns: Database manager instance

Instance Methods

execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any

Execute SQL queries against the database.

Parameters:

  • sql (str): SQL query string
  • params (dict, optional): Query parameters for prepared statements
  • fetch (str): How to fetch results ("all", "one", or number)
  • timeout (int): Query timeout in seconds

Returns: Query results

get_tables() -> List[Dict[str, str]]

Get list of tables in the database.

Returns: List of dictionaries with name and comment keys

get_columns(table_name: str) -> List[Dict[str, Any]]

Get column information for a specific table.

Parameters:

  • table_name (str): Name of the table

Returns: List of dictionaries with column metadata (name, data_type, comment, is_pk)

get_foreign_keys() -> List[Dict[str, str]]

Get foreign key relationships in the database.

Returns: List of dictionaries with foreign key information

get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]

Get the most frequent values for each column in a table.

Parameters:

  • table_name (str): Name of the table
  • number_of_rows (int): Maximum number of example values per column

Returns: Dictionary mapping column names to lists of example values

query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]

Search for similar values using LSH (Locality-Sensitive Hashing).

Parameters:

  • keyword (str): Search term
  • signature_size (int): MinHash signature size
  • n_gram (int): N-gram size for text processing
  • top_n (int): Number of similar values to return

Returns: Nested dictionary: {table_name: {column_name: [similar_values]}}

Factory Pattern (Alternative API)

For more advanced usage, you can use the factory pattern:

from thoth_dbmanager import ThothDbFactory

# Create manager using factory
manager = ThothDbFactory.create_manager(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# List available database types
available_dbs = ThothDbFactory.list_available_databases()

# Get required parameters for a database type
params = ThothDbFactory.get_required_parameters("postgresql")

Document-Based API (Advanced)

The library also provides a document-based API for structured data access:

# Get tables as structured documents
if hasattr(manager, 'get_tables_as_documents'):
    table_docs = manager.get_tables_as_documents()
    for doc in table_docs:
        print(f"Table: {doc.table_name}")
        print(f"Schema: {doc.schema_name}")
        print(f"Comment: {doc.comment}")

# Get columns as structured documents
if hasattr(manager, 'get_columns_as_documents'):
    column_docs = manager.get_columns_as_documents("my_table")
    for doc in column_docs:
        print(f"Column: {doc.column_name} ({doc.data_type})")
        print(f"Nullable: {doc.is_nullable}")
        print(f"Primary Key: {doc.is_pk}")

Configuration Examples

PostgreSQL

manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=5432,
    database="myapp",
    user="dbuser",
    password="dbpass"
)

SQLite

manager = ThothDbManager.get_instance(
    db_type="sqlite",
    db_root_path="./data",
    db_mode="dev",
    database_path="./data/myapp.db"
)

MySQL/MariaDB

manager = ThothDbManager.get_instance(
    db_type="mysql",  # or "mariadb"
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=3306,
    database="myapp",
    user="dbuser",
    password="dbpass"
)

Error Handling

The library provides clear error messages for common issues:

try:
    manager = ThothDbManager.get_instance(
        db_type="postgresql",
        db_root_path="./data",
        # Missing required parameters
    )
except ValueError as e:
    print(f"Configuration error: {e}")

try:
    results = manager.execute_sql("SELECT * FROM nonexistent_table")
except Exception as e:
    print(f"Query error: {e}")

LSH Search Details

The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:

  1. Automatic Setup: LSH indexes are created automatically from your database content
  2. Cross-Column Search: Search across all text columns in all tables
  3. Fuzzy Matching: Find similar values even with typos or variations
  4. Configurable: Adjust similarity sensitivity with parameters

LSH Use Cases

  • Data Deduplication: Find duplicate or near-duplicate records
  • Data Quality: Identify inconsistent data entry
  • Search Enhancement: Provide "did you mean?" functionality
  • Data Exploration: Discover related content across tables

Architecture

The library uses a modern plugin architecture:

  • Plugins: Database-specific implementations
  • Adapters: Low-level database operations
  • Factory: Plugin instantiation and management
  • Documents: Type-safe data models using Pydantic
  • Registry: Plugin discovery and registration

This design makes it easy to:

  • Add support for new database types
  • Maintain consistent APIs across databases
  • Extend functionality without breaking existing code

Contributing

To add support for a new database type:

  1. Create an adapter class implementing DbAdapter
  2. Create a plugin class implementing DbPlugin
  3. Register the plugin with @register_plugin("db_type")
  4. Add connection parameter validation
  5. Implement required abstract methods

License

This project is licensed under the MIT License.

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_dbmanager-0.4.10.tar.gz (73.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_dbmanager-0.4.10-py3-none-any.whl (84.4 kB view details)

Uploaded Python 3

File details

Details for the file thoth_dbmanager-0.4.10.tar.gz.

File metadata

  • Download URL: thoth_dbmanager-0.4.10.tar.gz
  • Upload date:
  • Size: 73.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for thoth_dbmanager-0.4.10.tar.gz
Algorithm Hash digest
SHA256 0a24333a8baa0bebf30679a4617f9d7752fd6163c9fb0d4f1b288cf0a9bc3356
MD5 9ff8dc8d7499cfba5e28be8f29bf5ebf
BLAKE2b-256 121791764980c5c55ab9d41b6cff28f5e05c7c1195812278cd5c8c56820335a1

See more details on using hashes here.

File details

Details for the file thoth_dbmanager-0.4.10-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_dbmanager-0.4.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a299290922c9551d403b415b865794458641247eac649e39252ef727af5ccd86
MD5 43ebe7ff9cf5609d9436621459a4088c
BLAKE2b-256 777e87e101282eb832b54cdffd75d66e20cf1411b29ca8c244084c02a2c66bce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page