A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
Project description
Thoth Database Manager
A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
Features
- Multi-Database Support: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
- Plugin Architecture: Extensible design for adding new database types
- LSH Search: Locality-Sensitive Hashing for finding similar values across database columns
- Type Safety: Pydantic-based document models for structured data
- Backward Compatibility: Maintains compatibility with existing code
Installation
pip install thoth-dbmanager
Quick Start
Basic Usage
from thoth_dbmanager import ThothDbManager
# Create a database manager instance
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
db_mode="dev",
host="localhost",
port=5432,
database="mydb",
user="username",
password="password"
)
# Get database information
tables = manager.get_tables()
columns = manager.get_columns("my_table")
foreign_keys = manager.get_foreign_keys()
# Execute SQL queries
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")
# Get example data for understanding table contents
example_data = manager.get_example_data("my_table", number_of_rows=20)
LSH Similarity Search
# Query for similar values across all database columns
similar_values = manager.query_lsh(
keyword="john doe",
top_n=10,
signature_size=30,
n_gram=3
)
# Results are organized by table and column
for table_name, columns in similar_values.items():
for column_name, values in columns.items():
print(f"Similar values in {table_name}.{column_name}: {values}")
Supported Databases
| Database | Connection Parameters |
|---|---|
| PostgreSQL | host, port, database, user, password |
| MySQL | host, port, database, user, password |
| MariaDB | host, port, database, user, password |
| SQLite | database_path |
| Supabase | host, port, database, user, password, project_url, api_key |
| SQL Server | server, database, user, password |
| Oracle | host, port, service_name, user, password |
| Informix | server, database, host, user, password |
API Reference
ThothDbManager
The main class for database operations.
Class Methods
get_instance(db_type: str, **kwargs) -> ThothDbManager
Creates or retrieves a singleton instance of the database manager.
Parameters:
db_type(str): Database type identifierdb_root_path(str): Path to store database-related filesdb_mode(str): Operating mode ("dev", "prod", etc.)**kwargs: Database-specific connection parameters
Returns: Database manager instance
Instance Methods
execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any
Execute SQL queries against the database.
Parameters:
sql(str): SQL query stringparams(dict, optional): Query parameters for prepared statementsfetch(str): How to fetch results ("all", "one", or number)timeout(int): Query timeout in seconds
Returns: Query results
get_tables() -> List[Dict[str, str]]
Get list of tables in the database.
Returns: List of dictionaries with name and comment keys
get_columns(table_name: str) -> List[Dict[str, Any]]
Get column information for a specific table.
Parameters:
table_name(str): Name of the table
Returns: List of dictionaries with column metadata (name, data_type, comment, is_pk)
get_foreign_keys() -> List[Dict[str, str]]
Get foreign key relationships in the database.
Returns: List of dictionaries with foreign key information
get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]
Get the most frequent values for each column in a table.
Parameters:
table_name(str): Name of the tablenumber_of_rows(int): Maximum number of example values per column
Returns: Dictionary mapping column names to lists of example values
query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]
Search for similar values using LSH (Locality-Sensitive Hashing).
Parameters:
keyword(str): Search termsignature_size(int): MinHash signature sizen_gram(int): N-gram size for text processingtop_n(int): Number of similar values to return
Returns: Nested dictionary: {table_name: {column_name: [similar_values]}}
Factory Pattern (Alternative API)
For more advanced usage, you can use the factory pattern:
from thoth_dbmanager import ThothDbFactory
# Create manager using factory
manager = ThothDbFactory.create_manager(
db_type="postgresql",
db_root_path="./data",
db_mode="dev",
host="localhost",
port=5432,
database="mydb",
user="username",
password="password"
)
# List available database types
available_dbs = ThothDbFactory.list_available_databases()
# Get required parameters for a database type
params = ThothDbFactory.get_required_parameters("postgresql")
Document-Based API (Advanced)
The library also provides a document-based API for structured data access:
# Get tables as structured documents
if hasattr(manager, 'get_tables_as_documents'):
table_docs = manager.get_tables_as_documents()
for doc in table_docs:
print(f"Table: {doc.table_name}")
print(f"Schema: {doc.schema_name}")
print(f"Comment: {doc.comment}")
# Get columns as structured documents
if hasattr(manager, 'get_columns_as_documents'):
column_docs = manager.get_columns_as_documents("my_table")
for doc in column_docs:
print(f"Column: {doc.column_name} ({doc.data_type})")
print(f"Nullable: {doc.is_nullable}")
print(f"Primary Key: {doc.is_pk}")
Configuration Examples
PostgreSQL
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
db_mode="production",
host="localhost",
port=5432,
database="myapp",
user="dbuser",
password="dbpass"
)
SQLite
manager = ThothDbManager.get_instance(
db_type="sqlite",
db_root_path="./data",
db_mode="dev",
database_path="./data/myapp.db"
)
MySQL/MariaDB
manager = ThothDbManager.get_instance(
db_type="mysql", # or "mariadb"
db_root_path="./data",
db_mode="production",
host="localhost",
port=3306,
database="myapp",
user="dbuser",
password="dbpass"
)
Error Handling
The library provides clear error messages for common issues:
try:
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
# Missing required parameters
)
except ValueError as e:
print(f"Configuration error: {e}")
try:
results = manager.execute_sql("SELECT * FROM nonexistent_table")
except Exception as e:
print(f"Query error: {e}")
LSH Search Details
The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:
- Automatic Setup: LSH indexes are created automatically from your database content
- Cross-Column Search: Search across all text columns in all tables
- Fuzzy Matching: Find similar values even with typos or variations
- Configurable: Adjust similarity sensitivity with parameters
LSH Use Cases
- Data Deduplication: Find duplicate or near-duplicate records
- Data Quality: Identify inconsistent data entry
- Search Enhancement: Provide "did you mean?" functionality
- Data Exploration: Discover related content across tables
Architecture
The library uses a modern plugin architecture:
- Plugins: Database-specific implementations
- Adapters: Low-level database operations
- Factory: Plugin instantiation and management
- Documents: Type-safe data models using Pydantic
- Registry: Plugin discovery and registration
This design makes it easy to:
- Add support for new database types
- Maintain consistent APIs across databases
- Extend functionality without breaking existing code
Contributing
To add support for a new database type:
- Create an adapter class implementing
DbAdapter - Create a plugin class implementing
DbPlugin - Register the plugin with
@register_plugin("db_type") - Add connection parameter validation
- Implement required abstract methods
License
This project is licensed under the MIT License.
Support
For issues, questions, or contributions, please visit the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thoth_dbmanager-0.4.10.tar.gz.
File metadata
- Download URL: thoth_dbmanager-0.4.10.tar.gz
- Upload date:
- Size: 73.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a24333a8baa0bebf30679a4617f9d7752fd6163c9fb0d4f1b288cf0a9bc3356
|
|
| MD5 |
9ff8dc8d7499cfba5e28be8f29bf5ebf
|
|
| BLAKE2b-256 |
121791764980c5c55ab9d41b6cff28f5e05c7c1195812278cd5c8c56820335a1
|
File details
Details for the file thoth_dbmanager-0.4.10-py3-none-any.whl.
File metadata
- Download URL: thoth_dbmanager-0.4.10-py3-none-any.whl
- Upload date:
- Size: 84.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a299290922c9551d403b415b865794458641247eac649e39252ef727af5ccd86
|
|
| MD5 |
43ebe7ff9cf5609d9436621459a4088c
|
|
| BLAKE2b-256 |
777e87e101282eb832b54cdffd75d66e20cf1411b29ca8c244084c02a2c66bce
|