SQLAlchemy dialects for multiple databases
Project description
ns-sqlalchemy
SQLAlchemy dialects for multiple databases.
Overview
ns-sqlalchemy is a Python package that provides SQLAlchemy dialect implementations for various databases. Currently, it supports:
Cloud Warehouses
- ClickHouse: Full dialect support with HTTP, Native, and asynch drivers
- Amazon Redshift: Synchronous driver via
redshift_connector - Google BigQuery: Synchronous driver via
google.cloud.bigquery - Snowflake: Synchronous driver via
snowflake.connector - Firebolt: Synchronous driver via
firebolt.db - Databricks: Synchronous driver via
databricks.sql - Amazon Athena: Synchronous driver via
pyathena
MySQL-Compatible
- TiDB: MySQL protocol via
pymysql(port 4000) - OceanBase: MySQL protocol via
pymysql(port 2883)
PostgreSQL-Compatible
- CockroachDB: PostgreSQL protocol via
psycopg2(port 26257)
Big Data / Analytics
- Apache Druid: Synchronous driver via
pydruid - Apache Hive: Synchronous driver via
pyhive.hive - Presto: Synchronous driver via
pyhive.presto
Enterprise Relational
- IBM DB2: Synchronous driver via
ibm_db_dbi - SAP Hana: Synchronous driver via
hdbcli.dbapi - Teradata: Synchronous driver via
teradatasql - Firebird: Synchronous driver via
fdb - Microsoft Access: Synchronous driver via
pyodbc - CrateDB: Synchronous driver via
crate.client
NoSQL / Other
- MongoDB: Read-only dialect via
mongodb-sqlalchemy - Elasticsearch: Read-only dialect via
elasticsearch-dbapi - Google Sheets: Read-only dialect via
shillelagh - Dameng (DM): Full dialect support with dmPython and dmAsync drivers
Features
ClickHouse
- Multiple Drivers: HTTP (
clickhouse+http://), Native (clickhouse+native://), asynch (clickhouse+asynch://) - Full Type Support: Int8–UInt256, Float32/64, Decimal, Date/Date32, DateTime/DateTime64, Enum8/16, Array, Nullable, LowCardinality, Tuple, Map, Nested, IPv4/IPv6, UUID, AggregateFunction, SimpleAggregateFunction
- DDL Support: MergeTree-family engines (PARTITION BY, ORDER BY, PRIMARY KEY, SAMPLE BY, TTL, SETTINGS), Replicated engines, Distributed, Buffer, MaterializedView, column CODEC/MATERIALIZED/ALIAS/AFTER
- SQL Extensions: FINAL, SAMPLE, LIMIT BY, ARRAY JOIN, WITH CUBE/ROLLUP/TOTALS, JOIN strictness/distribution, ALTER TABLE DELETE/UPDATE/ADD/DROP/MODIFY/RENAME
- Alembic Integration: Full migration support including Materialized Views
- ORM Support: Custom Query class with ClickHouse-specific methods, declarative base integration
- Reflection: Inspect tables, views, materialized views
Dameng
- Synchronous Driver:
dmPythondriver for synchronous operations - Asynchronous Driver:
dmAsyncdriver for async operations - Advanced Type Support: NUMBER, VARCHAR2, NVARCHAR2, CHAR, DATE, DATETIME, TIMESTAMP, INTERVAL, BLOB, CLOB, NCLOB, BFILE
- Vector Search: Built-in support for vector similarity search with IVF and HNSW indexes
- JSON Support: JSON data type with path indexing
- Compatibility Modes: DM, MySQL, TSQL, and Oracle compatibility modes
Installation
# Base install (supports ClickHouse HTTP driver)
pip install ns-sqlalchemy
# With Dameng support
pip install ns-sqlalchemy[dameng]
# Install optional database drivers
pip install pymysql # TiDB, OceanBase
pip install psycopg2-binary # CockroachDB
pip install redshift-connector # Redshift
pip install google-cloud-bigquery # BigQuery
pip install snowflake-connector-python # Snowflake
pip install firebolt-sdk # Firebolt
pip install databricks-sql-connector # Databricks
pip install pydruid # Druid
pip install pyhive # Hive, Presto
pip install ibm-db # DB2
pip install hdbcli # SAP Hana
pip install pyodbc # Access
pip install teradatasql # Teradata
pip install pyathena # Athena
pip install crate # CrateDB
pip install fdb # Firebird
pip install shillelagh # Google Sheets
pip install elasticsearch-dbapi # Elasticsearch
Usage
Basic Usage
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session
from ns_sqlalchemy.dameng import dialect
# Create engine with Dameng dialect
engine = create_engine(
"dm+dmPython://user:password@host:port/database",
dialect_options={"driver": "dmPython"}
)
# Create declarative base
Base = declarative_base()
# Define a model
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String(100))
email = Column(String(255))
# Create tables
Base.metadata.create_all(engine)
# Use with session
with Session(engine) as session:
# Insert data
new_user = User(name="John Doe", email="john@example.com")
session.add(new_user)
session.commit()
# Query data
users = session.query(User).filter(User.name == "John Doe").all()
Vector Search
from ns_sqlalchemy.dameng import VectorWordSeek
# Create vector search instance
vector_search = VectorWordSeek(
connection_str="dm+dmPython://user:password@host:port/database",
table_name="documents",
vector_dim=128,
drop_if_existing=False,
)
# Insert vectors
ids = vector_search.insert(
texts=["Hello world", "Goodbye world"],
metadatas=[{"source": "test"}, {"source": "test"}]
)
# Query similar vectors
results = vector_search.query(
DistanceMetric="COSINE",
query_vector=[0.1, 0.2, 0.3, ...],
count=5,
filter={"source": "test"}
)
ClickHouse Usage
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.orm import Session
from ns_sqlalchemy.clickhouse import engines, types
from ns_sqlalchemy.clickhouse.sql.schema import Table
from ns_sqlalchemy.clickhouse.sql.ddl import CreateTable
# Create engine with ClickHouse HTTP driver
engine = create_engine("clickhouse+http://user:password@localhost:8123/default")
# Define a table with ClickHouse-specific engine
table = Table(
"events",
MetaData(),
Column("event_id", String, primary_key=True),
Column("timestamp", DateTime),
Column("value", Float32),
engine=engines.MergeTree(
order_by=func.tuple("event_id"),
partition_by="toYYYYMM(timestamp)",
),
)
# Create table
CreateTable(table).execute(engine)
# Insert data with engine
with engine.connect() as conn:
conn.execute(
table.insert(),
{"event_id": "abc", "timestamp": datetime.now(), "value": 1.5},
)
# Query with ClickHouse-specific features
from sqlalchemy import select, func
from ns_sqlalchemy.clickhouse.sql.selectable import Select
s = (
Select(table)
.final()
.sample(0.1)
.limit_by(10, table.event_id)
)
Async ClickHouse
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
engine = create_async_engine(
"clickhouse+asynch://user:password@localhost:9000/default"
)
Alembic Integration
# In alembic/env.py:
from ns_sqlalchemy.clickhouse.alembic.dialect import (
ClickHouseDialectImpl,
patch_alembic_version,
include_object,
)
# Register the ClickHouse dialect implementation
context.configure(
connection=connection,
target_metadata=target_metadata,
include_object=include_object,
)
# For clustered ClickHouse:
# patch_alembic_version(context, cluster="my_cluster")
Dameng Async Usage
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import declarative_base
async def main():
# Create async engine
engine = create_async_engine(
"dm+dmAsync://user:password@host:port/database",
echo=True
)
# Create declarative base
Base = declarative_base()
# Define a model
class Product(Base):
__tablename__ = "products"
id = Column(Integer, primary_key=True)
name = Column(String(100))
price = Column(Float)
# Create tables
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
# Use with async session
async with AsyncSession(engine) as session:
# Insert data
new_product = Product(name="Laptop", price=999.99)
session.add(new_product)
await session.commit()
# Query data
result = await session.execute(
select(Product).filter(Product.price > 500)
)
products = result.scalars().all()
if __name__ == "__main__":
asyncio.run(main())
Supported Databases
ClickHouse
- URL Schemes:
clickhouse://,clickhouse+http://,clickhouse+native://,clickhouse+asynch:// - Drivers: HTTP (
httpx), Native (clickhouse-driver), asynch (clickhouse-driverasync) - Features: Full type support, DDL (MergeTree-family engines, column CODEC), SQL extensions (FINAL, SAMPLE, ARRAY JOIN), Alembic migrations, ORM support
TiDB
- URL Scheme:
tidb+pymysql://user:pass@host:4000/db - Protocol: MySQL (inherits MySQL DDL/SQL compilers)
- Default Port: 4000
OceanBase
- URL Scheme:
oceanbase+pymysql://user:pass@host:2883/db - Protocol: MySQL (inherits MySQL DDL/SQL compilers)
- Default Port: 2883
CockroachDB
- URL Scheme:
cockroachdb+psycopg2://user:pass@host:26257/db - Protocol: PostgreSQL (inherits PostgreSQL DDL/SQL compilers)
- Default Port: 26257
Amazon Redshift
- URL Scheme:
redshift+redshift_connector://user:pass@host:5439/db - Driver:
redshift_connector - Default Port: 5439
Google BigQuery
- URL Scheme:
bigquery://project/dataset - Driver:
google.cloud.bigquery - Auth: Uses Application Default Credentials or service account JSON
Snowflake
- URL Scheme:
snowflake://user:pass@account/db - Driver:
snowflake.connector
Firebolt
- URL Scheme:
firebolt://user:pass@engine/db - Driver:
firebolt.db
Databricks
- URL Scheme:
databricks://token@workspace/catalog - Driver:
databricks.sql
Apache Druid
- URL Scheme:
druid://host:8082/datasource - Driver:
pydruid
Apache Hive
- URL Scheme:
hive://host:10000/default - Driver:
pyhive.hive
Presto
- URL Scheme:
presto://host:8080/catalog - Driver:
pyhive.presto
IBM DB2
- URL Scheme:
db2://user:pass@host:50000/db - Driver:
ibm_db_dbi
SAP Hana
- URL Scheme:
hana://user:pass@host:30015/db - Driver:
hdbcli.dbapi
Teradata
- URL Scheme:
teradata://user:pass@host/db - Driver:
teradatasql
Microsoft Access
- URL Scheme:
access:///path/to/db.accdb - Driver:
pyodbc
CrateDB
- URL Scheme:
cratedb://host:4200 - Driver:
crate.client
Firebird
- URL Scheme:
firebird://user:pass@host:3050/db - Driver:
fdb
Amazon Athena
- URL Scheme:
athena://key:secret@region/schema?s3_staging_dir=s3://bucket/ - Driver:
pyathena
MongoDB
- URL Scheme:
mongodb://host:27017/db - Read-only:
supports_alter=False, no DDL support - Driver:
mongodb-sqlalchemy
Elasticsearch
- URL Scheme:
elasticsearch://host:9200 - Read-only:
supports_alter=False, no DDL support - Driver:
elasticsearch-dbapi
Google Sheets
- URL Scheme:
gsheets:// - Read-only:
supports_alter=False, no DDL support - Driver:
shillelagh
Dameng (DM)
- URL Schemes:
dm+dmPython://user:pass@host:port/db,dm+dmAsync://user:pass@host:port/db - Drivers:
dmPython(synchronous),dmAsync(asynchronous) - Compatibility Modes: DM, MySQL, TSQL, Oracle
- Features: Advanced type support, vector similarity search (IVF/HNSW indexes), JSON support
Type Support
ClickHouse Types
| SQLAlchemy Type | ClickHouse Type | Notes |
|---|---|---|
types.Int8 |
Int8 |
Signed 8-bit |
types.Int16 |
Int16 |
Signed 16-bit |
types.Int32 |
Int32 |
Signed 32-bit |
types.Int64 |
Int64 |
Signed 64-bit |
types.Int128 |
Int128 |
Signed 128-bit |
types.Int256 |
Int256 |
Signed 256-bit |
types.UInt8 |
UInt8 |
Unsigned 8-bit |
types.UInt16 |
UInt16 |
Unsigned 16-bit |
types.UInt32 |
UInt32 |
Unsigned 32-bit |
types.UInt64 |
UInt64 |
Unsigned 64-bit |
types.Float32 |
Float32 |
32-bit float |
types.Float64 |
Float64 |
64-bit float |
types.String |
String |
Variable-length string |
types.FixedString(n) |
FixedString(n) |
Fixed-length string |
types.Date |
Date |
Date |
types.Date32 |
Date32 |
Extended range date |
types.DateTime |
DateTime |
DateTime |
types.DateTime64(p) |
DateTime64(p) |
DateTime with precision |
types.Decimal(p,s) |
Decimal(p,s) |
Fixed-point number |
types.Boolean |
Boolean |
Boolean |
types.Enum8 |
Enum8 |
8-bit enum |
types.Enum16 |
Enum16 |
16-bit enum |
types.UUID |
UUID |
UUID |
types.IPv4 |
IPv4 |
IPv4 address |
types.IPv6 |
IPv6 |
IPv6 address |
types.Array(t) |
Array(t) |
Array of type |
types.Nullable(t) |
Nullable(t) |
Nullable wrapper |
types.LowCardinality(t) |
LowCardinality(t) |
Low cardinality wrapper |
types.Tuple(...) |
Tuple(...) |
Named tuple |
types.Map(k,v) |
Map(k,v) |
Key-value map |
types.Nested(...) |
Nested(...) |
Nested structure |
types.AggregateFunction(f,t) |
AggregateFunction(f,t) |
Aggregate function type |
types.SimpleAggregateFunction(f,t) |
SimpleAggregateFunction(f,t) |
Simple aggregate type |
Dameng Scalar Types
| SQLAlchemy Type | Dameng Type | Notes |
|---|---|---|
NUMBER |
NUMBER | Supports precision and scale |
VARCHAR2 |
VARCHAR2 | Variable character |
NVARCHAR2 |
NVARCHAR2 | National character varying |
CHAR |
CHAR | Fixed character |
DATE |
DATE | Date only |
DATETIME |
DATETIME | Date and time |
TIMESTAMP |
TIMESTAMP | Timestamp with timezone support |
INTERVAL |
INTERVAL | Time interval |
BLOB |
BLOB | Binary large object |
CLOB |
CLOB | Character large object |
NCLOB |
NCLOB | National character large object |
BFILE |
BFILE | Binary file |
Vector Types
VECTOR(dim=128, format="FLOAT32"): Vector type for similarity search- Supports L1, L2, cosine, dot product, and Hamming distances
JSON Types
JSON: JSON data typeJSONIndexType: JSON path index typeJSONPathType: JSON path type
Configuration
Connection Parameters
engine = create_engine(
"dm+dmPython://user:password@host:port/database?charset=utf8",
dialect_options={
"driver": "dmPython",
"auto_convert_lobs": True,
"coerce_to_decimal": True,
"arraysize": 50,
"connection_timeout": 30,
}
)
Compatibility Modes
# MySQL compatibility
engine = create_engine(
"dm+dmPython://user:password@host:port/database?parse_type=MYSQL",
dialect_options={"compatible_mode": "MYSQL"}
)
# TSQL compatibility
engine = create_engine(
"dm+dmPython://user:password@host:port/database?parse_type=TSQL",
dialect_options={"compatible_mode": "TSQL"}
)
# Oracle compatibility
engine = create_engine(
"dm+dmPython://user:password@host:port/database",
dialect_options={"compatible_mode": "ORACLE"}
)
Vector Index Operations
Create IVF Index
from ns_sqlalchemy.dameng import VectorAdaptor
vector_adaptor = VectorAdaptor(engine)
# Create IVF index
vector_adaptor.create_vector_ivf_index(
column=Product.embedding,
metric_name="COSINE",
percentage_value=90,
num_of_partitions=100,
)
Create HNSW Index
# Create HNSW index
vector_adaptor.create_vector_hnsw_index(
column=Product.embedding,
metric_name="COSINE",
percentage_value=90,
max_connection=16,
ef_construction=200,
)
Testing
Run the tests with pytest:
pytest tests/
Or run specific test suites:
# ClickHouse tests
pytest tests/test_clickhouse/
pytest tests/test_clickhouse/test_sql.py
pytest tests/test_clickhouse/test_ddl.py
# Dameng tests
pytest tests/test_dameng/
pytest tests/test_dameng/test_dialect.py
pytest tests/test_dameng/test_vector.py
# Database dialect registration tests
pytest tests/test_databases/
pytest tests/test_databases/test_dialects.py
Development
Project Structure
ns-sqlalchemy/
├── src/
│ └── ns_sqlalchemy/
│ ├── __init__.py
│ ├── clickhouse/
│ │ ├── __init__.py
│ │ ├── alembic/
│ │ │ ├── comparators.py
│ │ │ ├── dialect.py
│ │ │ ├── operations.py
│ │ │ ├── renderers.py
│ │ │ └── toimpl.py
│ │ ├── drivers/
│ │ │ ├── base.py
│ │ │ ├── compilers/
│ │ │ │ ├── ddlcompiler.py
│ │ │ │ ├── sqlcompiler.py
│ │ │ │ └── typecompiler.py
│ │ │ ├── http/
│ │ │ │ └── base.py
│ │ │ ├── native/
│ │ │ │ └── base.py
│ │ │ ├── asynch/
│ │ │ │ └── base.py
│ │ │ └── reflection.py
│ │ ├── engines/
│ │ │ ├── __init__.py
│ │ │ └── base.py
│ │ ├── ext/
│ │ │ ├── clauses.py
│ │ │ └── declarative.py
│ │ ├── orm/
│ │ │ ├── query.py
│ │ │ └── session.py
│ │ ├── sql/
│ │ │ ├── ddl.py
│ │ │ ├── functions.py
│ │ │ ├── schema.py
│ │ │ └── selectable.py
│ │ └── types.py
│ ├── databases/
│ │ ├── __init__.py
│ │ ├── access.py
│ │ ├── athena.py
│ │ ├── bigquery.py
│ │ ├── cockroachdb.py
│ │ ├── cratedb.py
│ │ ├── databricks.py
│ │ ├── db2.py
│ │ ├── druid.py
│ │ ├── elasticsearch.py
│ │ ├── firebird.py
│ │ ├── firebolt.py
│ │ ├── gsheets.py
│ │ ├── hana.py
│ │ ├── hive.py
│ │ ├── mongodb.py
│ │ ├── oceanbase.py
│ │ ├── presto.py
│ │ ├── redshift.py
│ │ ├── snowflake.py
│ │ ├── teradata.py
│ │ └── tidb.py
│ └── dameng/
│ ├── __init__.py
│ ├── _compat.py
│ ├── base.py
│ ├── dmPython.py
│ ├── dmAsync.py
│ ├── globalvars.py
│ ├── json.py
│ ├── types.py
│ └── vector.py
├── tests/
│ ├── test_clickhouse/
│ │ ├── test_clickhouse_dialect.py
│ │ ├── test_ddl.py
│ │ ├── test_engines.py
│ │ ├── test_orm.py
│ │ ├── test_sql.py
│ │ ├── test_type_compiler.py
│ │ └── test_types.py
│ ├── test_databases/
│ │ ├── __init__.py
│ │ └── test_dialects.py
│ └── test_dameng/
│ ├── test_basic.py
│ ├── test_dialect.py
│ └── test_vector.py
├── pyproject.toml
├── README.md
└── LICENSE
Code Style
This project uses:
- Ruff: Linting and formatting
- MyPy: Type checking
Run code quality checks:
ruff check src/ tests/
ruff format --check src/ tests/
ruff check src/ns_sqlalchemy/clickhouse/ # ClickHouse should be clean
Testing
The project uses pytest for testing. All tests are located in the tests/ directory.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions are welcome! Please follow the existing code style and conventions.
Support
For issues and questions, please create an issue in the GitHub repository.
Acknowledgments
- Based on the original ns-dm-sqlalchemy project
- Inspired by SQLAlchemy's dialect architecture
- Thanks to the open-source community for their contributions
Created with ❤️ for high-performance database applications
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ns_sqlalchemy-0.1.1.tar.gz.
File metadata
- Download URL: ns_sqlalchemy-0.1.1.tar.gz
- Upload date:
- Size: 127.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
915d60c2a32d639766ee9ab48c7a99fc56c434a06cdbaef46e250fd9acfc8379
|
|
| MD5 |
c681bbec9af1d5736b49ce8f9fa6d005
|
|
| BLAKE2b-256 |
b8c235ec84fa5ee9de0410b5a07803225e4676e2ffde086cf4af38fe58b494af
|
File details
Details for the file ns_sqlalchemy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ns_sqlalchemy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 228.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd1852eee859bb82dae97bfb73fce78949ca8ab9961af8ee47e9cb1e78b294c6
|
|
| MD5 |
1d331dacdbbc09e25d6b0627f529b195
|
|
| BLAKE2b-256 |
338bb120e5b3d814c1a0b7b477664bfba577ef8f033593ac17515820beab50c0
|