Skip to main content

SQLAlchemy dialects for multiple databases

Project description

ns-sqlalchemy

SQLAlchemy dialects for multiple databases.

Overview

ns-sqlalchemy is a Python package that provides SQLAlchemy dialect implementations for various databases. Currently, it supports:

Cloud Warehouses

  • ClickHouse: Full dialect support with HTTP, Native, and asynch drivers
  • Amazon Redshift: Synchronous driver via redshift_connector
  • Google BigQuery: Synchronous driver via google.cloud.bigquery
  • Snowflake: Synchronous driver via snowflake.connector
  • Firebolt: Synchronous driver via firebolt.db
  • Databricks: Synchronous driver via databricks.sql
  • Amazon Athena: Synchronous driver via pyathena

MySQL-Compatible

  • TiDB: MySQL protocol via pymysql (port 4000)
  • OceanBase: MySQL protocol via pymysql (port 2883)

PostgreSQL-Compatible

  • CockroachDB: PostgreSQL protocol via psycopg2 (port 26257)

Big Data / Analytics

  • Apache Druid: Synchronous driver via pydruid
  • Apache Hive: Synchronous driver via pyhive.hive
  • Presto: Synchronous driver via pyhive.presto

Enterprise Relational

  • IBM DB2: Synchronous driver via ibm_db_dbi
  • SAP Hana: Synchronous driver via hdbcli.dbapi
  • Teradata: Synchronous driver via teradatasql
  • Firebird: Synchronous driver via fdb
  • Microsoft Access: Synchronous driver via pyodbc
  • CrateDB: Synchronous driver via crate.client

NoSQL / Other

  • MongoDB: Read-only dialect via mongodb-sqlalchemy
  • Elasticsearch: Read-only dialect via elasticsearch-dbapi
  • Google Sheets: Read-only dialect via shillelagh
  • Dameng (DM): Full dialect support with dmPython and dmAsync drivers

Features

ClickHouse

  • Multiple Drivers: HTTP (clickhouse+http://), Native (clickhouse+native://), asynch (clickhouse+asynch://)
  • Full Type Support: Int8–UInt256, Float32/64, Decimal, Date/Date32, DateTime/DateTime64, Enum8/16, Array, Nullable, LowCardinality, Tuple, Map, Nested, IPv4/IPv6, UUID, AggregateFunction, SimpleAggregateFunction
  • DDL Support: MergeTree-family engines (PARTITION BY, ORDER BY, PRIMARY KEY, SAMPLE BY, TTL, SETTINGS), Replicated engines, Distributed, Buffer, MaterializedView, column CODEC/MATERIALIZED/ALIAS/AFTER
  • SQL Extensions: FINAL, SAMPLE, LIMIT BY, ARRAY JOIN, WITH CUBE/ROLLUP/TOTALS, JOIN strictness/distribution, ALTER TABLE DELETE/UPDATE/ADD/DROP/MODIFY/RENAME
  • Alembic Integration: Full migration support including Materialized Views
  • ORM Support: Custom Query class with ClickHouse-specific methods, declarative base integration
  • Reflection: Inspect tables, views, materialized views

Dameng

  • Synchronous Driver: dmPython driver for synchronous operations
  • Asynchronous Driver: dmAsync driver for async operations
  • Advanced Type Support: NUMBER, VARCHAR2, NVARCHAR2, CHAR, DATE, DATETIME, TIMESTAMP, INTERVAL, BLOB, CLOB, NCLOB, BFILE
  • Vector Search: Built-in support for vector similarity search with IVF and HNSW indexes
  • JSON Support: JSON data type with path indexing
  • Compatibility Modes: DM, MySQL, TSQL, and Oracle compatibility modes

Installation

# Base install (supports ClickHouse HTTP driver)
pip install ns-sqlalchemy

# With Dameng support
pip install ns-sqlalchemy[dameng]

# Install optional database drivers
pip install pymysql          # TiDB, OceanBase
pip install psycopg2-binary  # CockroachDB
pip install redshift-connector  # Redshift
pip install google-cloud-bigquery  # BigQuery
pip install snowflake-connector-python  # Snowflake
pip install firebolt-sdk     # Firebolt
pip install databricks-sql-connector  # Databricks
pip install pydruid          # Druid
pip install pyhive           # Hive, Presto
pip install ibm-db           # DB2
pip install hdbcli           # SAP Hana
pip install pyodbc           # Access
pip install teradatasql      # Teradata
pip install pyathena         # Athena
pip install crate            # CrateDB
pip install fdb              # Firebird
pip install shillelagh       # Google Sheets
pip install elasticsearch-dbapi  # Elasticsearch

Usage

Basic Usage

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session
from ns_sqlalchemy.dameng import dialect

# Create engine with Dameng dialect
engine = create_engine(
    "dm+dmPython://user:password@host:port/database",
    dialect_options={"driver": "dmPython"}
)

# Create declarative base
Base = declarative_base()

# Define a model
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    email = Column(String(255))

# Create tables
Base.metadata.create_all(engine)

# Use with session
with Session(engine) as session:
    # Insert data
    new_user = User(name="John Doe", email="john@example.com")
    session.add(new_user)
    session.commit()

    # Query data
    users = session.query(User).filter(User.name == "John Doe").all()

Vector Search

from ns_sqlalchemy.dameng import VectorWordSeek

# Create vector search instance
vector_search = VectorWordSeek(
    connection_str="dm+dmPython://user:password@host:port/database",
    table_name="documents",
    vector_dim=128,
    drop_if_existing=False,
)

# Insert vectors
ids = vector_search.insert(
    texts=["Hello world", "Goodbye world"],
    metadatas=[{"source": "test"}, {"source": "test"}]
)

# Query similar vectors
results = vector_search.query(
    DistanceMetric="COSINE",
    query_vector=[0.1, 0.2, 0.3, ...],
    count=5,
    filter={"source": "test"}
)

ClickHouse Usage

from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.orm import Session
from ns_sqlalchemy.clickhouse import engines, types
from ns_sqlalchemy.clickhouse.sql.schema import Table
from ns_sqlalchemy.clickhouse.sql.ddl import CreateTable

# Create engine with ClickHouse HTTP driver
engine = create_engine("clickhouse+http://user:password@localhost:8123/default")

# Define a table with ClickHouse-specific engine
table = Table(
    "events",
    MetaData(),
    Column("event_id", String, primary_key=True),
    Column("timestamp", DateTime),
    Column("value", Float32),
    engine=engines.MergeTree(
        order_by=func.tuple("event_id"),
        partition_by="toYYYYMM(timestamp)",
    ),
)

# Create table
CreateTable(table).execute(engine)

# Insert data with engine
with engine.connect() as conn:
    conn.execute(
        table.insert(),
        {"event_id": "abc", "timestamp": datetime.now(), "value": 1.5},
    )

# Query with ClickHouse-specific features
from sqlalchemy import select, func
from ns_sqlalchemy.clickhouse.sql.selectable import Select

s = (
    Select(table)
    .final()
    .sample(0.1)
    .limit_by(10, table.event_id)
)

Async ClickHouse

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

engine = create_async_engine(
    "clickhouse+asynch://user:password@localhost:9000/default"
)

Alembic Integration

# In alembic/env.py:
from ns_sqlalchemy.clickhouse.alembic.dialect import (
    ClickHouseDialectImpl,
    patch_alembic_version,
    include_object,
)

# Register the ClickHouse dialect implementation
context.configure(
    connection=connection,
    target_metadata=target_metadata,
    include_object=include_object,
)

# For clustered ClickHouse:
# patch_alembic_version(context, cluster="my_cluster")

Dameng Async Usage

import asyncio
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import declarative_base

async def main():
    # Create async engine
    engine = create_async_engine(
        "dm+dmAsync://user:password@host:port/database",
        echo=True
    )

    # Create declarative base
    Base = declarative_base()

    # Define a model
    class Product(Base):
        __tablename__ = "products"
        id = Column(Integer, primary_key=True)
        name = Column(String(100))
        price = Column(Float)

    # Create tables
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

    # Use with async session
    async with AsyncSession(engine) as session:
        # Insert data
        new_product = Product(name="Laptop", price=999.99)
        session.add(new_product)
        await session.commit()

        # Query data
        result = await session.execute(
            select(Product).filter(Product.price > 500)
        )
        products = result.scalars().all()

if __name__ == "__main__":
    asyncio.run(main())

Supported Databases

ClickHouse

  • URL Schemes: clickhouse://, clickhouse+http://, clickhouse+native://, clickhouse+asynch://
  • Drivers: HTTP (httpx), Native (clickhouse-driver), asynch (clickhouse-driver async)
  • Features: Full type support, DDL (MergeTree-family engines, column CODEC), SQL extensions (FINAL, SAMPLE, ARRAY JOIN), Alembic migrations, ORM support

TiDB

  • URL Scheme: tidb+pymysql://user:pass@host:4000/db
  • Protocol: MySQL (inherits MySQL DDL/SQL compilers)
  • Default Port: 4000

OceanBase

  • URL Scheme: oceanbase+pymysql://user:pass@host:2883/db
  • Protocol: MySQL (inherits MySQL DDL/SQL compilers)
  • Default Port: 2883

CockroachDB

  • URL Scheme: cockroachdb+psycopg2://user:pass@host:26257/db
  • Protocol: PostgreSQL (inherits PostgreSQL DDL/SQL compilers)
  • Default Port: 26257

Amazon Redshift

  • URL Scheme: redshift+redshift_connector://user:pass@host:5439/db
  • Driver: redshift_connector
  • Default Port: 5439

Google BigQuery

  • URL Scheme: bigquery://project/dataset
  • Driver: google.cloud.bigquery
  • Auth: Uses Application Default Credentials or service account JSON

Snowflake

  • URL Scheme: snowflake://user:pass@account/db
  • Driver: snowflake.connector

Firebolt

  • URL Scheme: firebolt://user:pass@engine/db
  • Driver: firebolt.db

Databricks

  • URL Scheme: databricks://token@workspace/catalog
  • Driver: databricks.sql

Apache Druid

  • URL Scheme: druid://host:8082/datasource
  • Driver: pydruid

Apache Hive

  • URL Scheme: hive://host:10000/default
  • Driver: pyhive.hive

Presto

  • URL Scheme: presto://host:8080/catalog
  • Driver: pyhive.presto

IBM DB2

  • URL Scheme: db2://user:pass@host:50000/db
  • Driver: ibm_db_dbi

SAP Hana

  • URL Scheme: hana://user:pass@host:30015/db
  • Driver: hdbcli.dbapi

Teradata

  • URL Scheme: teradata://user:pass@host/db
  • Driver: teradatasql

Microsoft Access

  • URL Scheme: access:///path/to/db.accdb
  • Driver: pyodbc

CrateDB

  • URL Scheme: cratedb://host:4200
  • Driver: crate.client

Firebird

  • URL Scheme: firebird://user:pass@host:3050/db
  • Driver: fdb

Amazon Athena

  • URL Scheme: athena://key:secret@region/schema?s3_staging_dir=s3://bucket/
  • Driver: pyathena

MongoDB

  • URL Scheme: mongodb://host:27017/db
  • Read-only: supports_alter=False, no DDL support
  • Driver: mongodb-sqlalchemy

Elasticsearch

  • URL Scheme: elasticsearch://host:9200
  • Read-only: supports_alter=False, no DDL support
  • Driver: elasticsearch-dbapi

Google Sheets

  • URL Scheme: gsheets://
  • Read-only: supports_alter=False, no DDL support
  • Driver: shillelagh

Dameng (DM)

  • URL Schemes: dm+dmPython://user:pass@host:port/db, dm+dmAsync://user:pass@host:port/db
  • Drivers: dmPython (synchronous), dmAsync (asynchronous)
  • Compatibility Modes: DM, MySQL, TSQL, Oracle
  • Features: Advanced type support, vector similarity search (IVF/HNSW indexes), JSON support

Type Support

ClickHouse Types

SQLAlchemy Type ClickHouse Type Notes
types.Int8 Int8 Signed 8-bit
types.Int16 Int16 Signed 16-bit
types.Int32 Int32 Signed 32-bit
types.Int64 Int64 Signed 64-bit
types.Int128 Int128 Signed 128-bit
types.Int256 Int256 Signed 256-bit
types.UInt8 UInt8 Unsigned 8-bit
types.UInt16 UInt16 Unsigned 16-bit
types.UInt32 UInt32 Unsigned 32-bit
types.UInt64 UInt64 Unsigned 64-bit
types.Float32 Float32 32-bit float
types.Float64 Float64 64-bit float
types.String String Variable-length string
types.FixedString(n) FixedString(n) Fixed-length string
types.Date Date Date
types.Date32 Date32 Extended range date
types.DateTime DateTime DateTime
types.DateTime64(p) DateTime64(p) DateTime with precision
types.Decimal(p,s) Decimal(p,s) Fixed-point number
types.Boolean Boolean Boolean
types.Enum8 Enum8 8-bit enum
types.Enum16 Enum16 16-bit enum
types.UUID UUID UUID
types.IPv4 IPv4 IPv4 address
types.IPv6 IPv6 IPv6 address
types.Array(t) Array(t) Array of type
types.Nullable(t) Nullable(t) Nullable wrapper
types.LowCardinality(t) LowCardinality(t) Low cardinality wrapper
types.Tuple(...) Tuple(...) Named tuple
types.Map(k,v) Map(k,v) Key-value map
types.Nested(...) Nested(...) Nested structure
types.AggregateFunction(f,t) AggregateFunction(f,t) Aggregate function type
types.SimpleAggregateFunction(f,t) SimpleAggregateFunction(f,t) Simple aggregate type

Dameng Scalar Types

SQLAlchemy Type Dameng Type Notes
NUMBER NUMBER Supports precision and scale
VARCHAR2 VARCHAR2 Variable character
NVARCHAR2 NVARCHAR2 National character varying
CHAR CHAR Fixed character
DATE DATE Date only
DATETIME DATETIME Date and time
TIMESTAMP TIMESTAMP Timestamp with timezone support
INTERVAL INTERVAL Time interval
BLOB BLOB Binary large object
CLOB CLOB Character large object
NCLOB NCLOB National character large object
BFILE BFILE Binary file

Vector Types

  • VECTOR(dim=128, format="FLOAT32"): Vector type for similarity search
  • Supports L1, L2, cosine, dot product, and Hamming distances

JSON Types

  • JSON: JSON data type
  • JSONIndexType: JSON path index type
  • JSONPathType: JSON path type

Configuration

Connection Parameters

engine = create_engine(
    "dm+dmPython://user:password@host:port/database?charset=utf8",
    dialect_options={
        "driver": "dmPython",
        "auto_convert_lobs": True,
        "coerce_to_decimal": True,
        "arraysize": 50,
        "connection_timeout": 30,
    }
)

Compatibility Modes

# MySQL compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database?parse_type=MYSQL",
    dialect_options={"compatible_mode": "MYSQL"}
)

# TSQL compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database?parse_type=TSQL",
    dialect_options={"compatible_mode": "TSQL"}
)

# Oracle compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database",
    dialect_options={"compatible_mode": "ORACLE"}
)

Vector Index Operations

Create IVF Index

from ns_sqlalchemy.dameng import VectorAdaptor

vector_adaptor = VectorAdaptor(engine)

# Create IVF index
vector_adaptor.create_vector_ivf_index(
    column=Product.embedding,
    metric_name="COSINE",
    percentage_value=90,
    num_of_partitions=100,
)

Create HNSW Index

# Create HNSW index
vector_adaptor.create_vector_hnsw_index(
    column=Product.embedding,
    metric_name="COSINE",
    percentage_value=90,
    max_connection=16,
    ef_construction=200,
)

Testing

Run the tests with pytest:

pytest tests/

Or run specific test suites:

# ClickHouse tests
pytest tests/test_clickhouse/
pytest tests/test_clickhouse/test_sql.py
pytest tests/test_clickhouse/test_ddl.py

# Dameng tests
pytest tests/test_dameng/
pytest tests/test_dameng/test_dialect.py
pytest tests/test_dameng/test_vector.py

# Database dialect registration tests
pytest tests/test_databases/
pytest tests/test_databases/test_dialects.py

Development

Project Structure

ns-sqlalchemy/
├── src/
│   └── ns_sqlalchemy/
│       ├── __init__.py
│       ├── clickhouse/
│       │   ├── __init__.py
│       │   ├── alembic/
│       │   │   ├── comparators.py
│       │   │   ├── dialect.py
│       │   │   ├── operations.py
│       │   │   ├── renderers.py
│       │   │   └── toimpl.py
│       │   ├── drivers/
│       │   │   ├── base.py
│       │   │   ├── compilers/
│       │   │   │   ├── ddlcompiler.py
│       │   │   │   ├── sqlcompiler.py
│       │   │   │   └── typecompiler.py
│       │   │   ├── http/
│       │   │   │   └── base.py
│       │   │   ├── native/
│       │   │   │   └── base.py
│       │   │   ├── asynch/
│       │   │   │   └── base.py
│       │   │   └── reflection.py
│       │   ├── engines/
│       │   │   ├── __init__.py
│       │   │   └── base.py
│       │   ├── ext/
│       │   │   ├── clauses.py
│       │   │   └── declarative.py
│       │   ├── orm/
│       │   │   ├── query.py
│       │   │   └── session.py
│       │   ├── sql/
│       │   │   ├── ddl.py
│       │   │   ├── functions.py
│       │   │   ├── schema.py
│       │   │   └── selectable.py
│       │   └── types.py
│       ├── databases/
│       │   ├── __init__.py
│       │   ├── access.py
│       │   ├── athena.py
│       │   ├── bigquery.py
│       │   ├── cockroachdb.py
│       │   ├── cratedb.py
│       │   ├── databricks.py
│       │   ├── db2.py
│       │   ├── druid.py
│       │   ├── elasticsearch.py
│       │   ├── firebird.py
│       │   ├── firebolt.py
│       │   ├── gsheets.py
│       │   ├── hana.py
│       │   ├── hive.py
│       │   ├── mongodb.py
│       │   ├── oceanbase.py
│       │   ├── presto.py
│       │   ├── redshift.py
│       │   ├── snowflake.py
│       │   ├── teradata.py
│       │   └── tidb.py
│       └── dameng/
│           ├── __init__.py
│           ├── _compat.py
│           ├── base.py
│           ├── dmPython.py
│           ├── dmAsync.py
│           ├── globalvars.py
│           ├── json.py
│           ├── types.py
│           └── vector.py
├── tests/
│   ├── test_clickhouse/
│   │   ├── test_clickhouse_dialect.py
│   │   ├── test_ddl.py
│   │   ├── test_engines.py
│   │   ├── test_orm.py
│   │   ├── test_sql.py
│   │   ├── test_type_compiler.py
│   │   └── test_types.py
│   ├── test_databases/
│   │   ├── __init__.py
│   │   └── test_dialects.py
│   └── test_dameng/
│       ├── test_basic.py
│       ├── test_dialect.py
│       └── test_vector.py
├── pyproject.toml
├── README.md
└── LICENSE

Code Style

This project uses:

  • Ruff: Linting and formatting
  • MyPy: Type checking

Run code quality checks:

ruff check src/ tests/
ruff format --check src/ tests/
ruff check src/ns_sqlalchemy/clickhouse/  # ClickHouse should be clean

Testing

The project uses pytest for testing. All tests are located in the tests/ directory.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please follow the existing code style and conventions.

Support

For issues and questions, please create an issue in the GitHub repository.

Acknowledgments

  • Based on the original ns-dm-sqlalchemy project
  • Inspired by SQLAlchemy's dialect architecture
  • Thanks to the open-source community for their contributions

Created with ❤️ for high-performance database applications

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ns_sqlalchemy-0.1.1.tar.gz (127.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ns_sqlalchemy-0.1.1-py3-none-any.whl (228.5 kB view details)

Uploaded Python 3

File details

Details for the file ns_sqlalchemy-0.1.1.tar.gz.

File metadata

  • Download URL: ns_sqlalchemy-0.1.1.tar.gz
  • Upload date:
  • Size: 127.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ns_sqlalchemy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 915d60c2a32d639766ee9ab48c7a99fc56c434a06cdbaef46e250fd9acfc8379
MD5 c681bbec9af1d5736b49ce8f9fa6d005
BLAKE2b-256 b8c235ec84fa5ee9de0410b5a07803225e4676e2ffde086cf4af38fe58b494af

See more details on using hashes here.

File details

Details for the file ns_sqlalchemy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ns_sqlalchemy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 228.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ns_sqlalchemy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bd1852eee859bb82dae97bfb73fce78949ca8ab9961af8ee47e9cb1e78b294c6
MD5 1d331dacdbbc09e25d6b0627f529b195
BLAKE2b-256 338bb120e5b3d814c1a0b7b477664bfba577ef8f033593ac17515820beab50c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page