A Python library for batching SQL statements to optimize database operations

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

branh

These details have not been verified by PyPI

Project description

SQL Batcher

Why SQL Batcher?

Data engineers and developers face significant challenges when working with large-scale database operations: performance bottlenecks, memory constraints, network overhead, and the complexity of managing transactions across different database systems.

SQL Batcher attempts to address these pain points with:

Performance Optimization: Intelligently batch and merge SQL statements to reduce round-trips to the database, dramatically improving throughput for large-scale operations
Memory Efficiency: Control memory usage with configurable batch sizes and smart statement merging, preventing out-of-memory errors during massive data operations
Database Adaptability: Leverage database-specific optimizations with adapters for PostgreSQL, Trino, Snowflake, and more
Transaction Management: Simplify complex transaction handling with built-in savepoints, retries, and error recovery
Developer Experience: Write clean, maintainable code with an intuitive API that abstracts away the complexities of efficient database interactions

SQL Batcher is particularly valuable in data engineering workflows, ETL pipelines, large dataset ingestion, and any scenario requiring high-performance database operations.

Key Features

SQL Batcher provides a comprehensive set of features for efficient SQL statement execution:

SQL Batcher

Efficiently batch SQL statements based on size limits and other constraints. The core component that handles:

Smart batching based on database-specific size limits
Dynamic batch size adjustment based on column count
Memory and network optimization
Learn more about SQL Batcher →

Query Collector

Collect and track SQL queries for debugging, logging, and monitoring:

Query collection with metadata support
Size tracking and batch management
Column count detection for INSERT statements
Learn more about Query Collector →

Insert Merging

Optimize database operations by combining compatible INSERT statements:

Automatic detection of compatible statements
Size-aware merging respecting query limits
Table and column structure awareness
Preserves execution order of non-INSERT statements
Learn more about Insert Merging →

Database Adapters

Optimized adapters for popular databases:

Database-specific optimizations
Consistent interface across databases
Connection and resource management
Learn more about Database Adapters →

Async Support

Comprehensive async support for modern Python applications:

Async batching and execution
Async adapters for all supported databases
Async context managers and transaction management
Learn more about Async Support →

Context Manager

Clean resource management and automatic flushing of batched statements:

Automatic flushing when exiting the context
Proper resource cleanup and error handling
Support for both synchronous and asynchronous operations
Seamless integration with transaction management
Learn more about Context Manager →

Transaction Management

Control transaction boundaries and ensure data consistency:

Begin, commit, and rollback transactions
Error handling and recovery
Integration with context managers
Learn more about Transaction Management →

Savepoint Support

Create intermediate points within a transaction for partial rollbacks:

Create, rollback to, and release savepoints
Error recovery within transactions
Support for complex transaction workflows
Learn more about Savepoint Support →

Installation

Install SQL Batcher using pip:

pip install sql-batcher

With database-specific dependencies:

# For Trino support
pip install "sql-batcher[trino]"

# For PostgreSQL support
pip install "sql-batcher[postgresql]"

# For Snowflake support
pip install "sql-batcher[snowflake]"

# For BigQuery support
pip install "sql-batcher[bigquery]"

# For all supported databases
pip install "sql-batcher[all]"

# For development (includes testing and linting tools)
pip install "sql-batcher[dev]"

Quick Start

Here's a simple example to get you started with SQL Batcher:

from sql_batcher import SQLBatcher
from sql_batcher.adapters import TrinoAdapter

# Create adapter and batcher
adapter = TrinoAdapter(
    host="trino.example.com",
    port=8080,
    user="trino",
    catalog="hive",
    schema="default",
    role="admin",  # Trino role (sets 'x-trino-role' HTTP header as 'system=ROLE{role}')
    max_query_size=600_000  # 600KB limit to provide buffer for Trino's 1MB limit
)

batcher = SQLBatcher(
    adapter=adapter,
    max_bytes=500_000,  # 500KB limit
    batch_mode=True,
    auto_adjust_for_columns=True  # Adjust batch size based on column count
)

# Process statements
statements = [
    "INSERT INTO table1 VALUES (1, 'a')",
    "INSERT INTO table1 VALUES (2, 'b')",
    # ... many more statements
]

# Process all statements in batches
batcher.process_statements(statements, adapter.execute)

For async usage:

import asyncio
from sql_batcher import AsyncSQLBatcher
from sql_batcher.adapters.async_trino import AsyncTrinoAdapter

async def main():
    # Create async adapter and batcher
    adapter = AsyncTrinoAdapter(
        host="trino.example.com",
        port=8080,
        user="trino",
        catalog="hive",
        schema="default",
        role="admin",  # Trino role (sets 'x-trino-role' HTTP header as 'system=ROLE{role}')
        max_query_size=600_000  # 600KB limit to provide buffer for Trino's 1MB limit
    )

    batcher = AsyncSQLBatcher(
        adapter=adapter,
        max_bytes=500_000,  # 500KB limit
        batch_mode=True,
        auto_adjust_for_columns=True  # Adjust batch size based on column count
    )

    # Process statements asynchronously
    statements = [
        "INSERT INTO table1 VALUES (1, 'a')",
        "INSERT INTO table1 VALUES (2, 'b')",
        # ... many more statements
    ]

    await batcher.process_statements(statements, adapter.execute)

    # Close the connection
    await adapter.close()

# Run the async function
asyncio.run(main())

Documentation

For more detailed documentation, see the following pages:

SQL Batcher - Core batching functionality
Query Collector - Query collection and tracking
Database Adapters - Database-specific adapters
Async Support - Async functionality
Context Manager - Clean resource management
Transaction Management - Transaction control
Savepoint Support - Savepoint functionality
Insert Merging - INSERT statement optimization
Usage Examples - Collection of usage examples
Testing - Testing guidelines and examples
Code Style Guide - Code style guidelines

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Install pre-commit hooks (pip install pre-commit && pre-commit install)
Make your changes (the pre-commit hooks will automatically format your code)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

The project uses pre-commit hooks to ensure code quality:

black for code formatting
isort for import sorting
flake8 for code linting
autoflake for removing unused imports and variables

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

branh

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.4

Apr 30, 2025

0.1.3

Apr 30, 2025

0.1.2

Apr 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_batcher-0.1.4.tar.gz (103.1 kB view details)

Uploaded Apr 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sql_batcher-0.1.4-py3-none-any.whl (49.7 kB view details)

Uploaded Apr 30, 2025 Python 3

File details

Details for the file sql_batcher-0.1.4.tar.gz.

File metadata

Download URL: sql_batcher-0.1.4.tar.gz
Upload date: Apr 30, 2025
Size: 103.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql_batcher-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`2d3fd31671e0ab8107a4b8783080ed62930155becd9c000ae984aeee3546792d`
MD5	`9e05d7f9c3b29cf326bd0552038653c0`
BLAKE2b-256	`a452591a6af25f252aab943ad651261e9a102c8c875b2183d6f3f89a55292035`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_batcher-0.1.4.tar.gz:

Publisher: publish.yml on brannn/sql-batcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sql_batcher-0.1.4.tar.gz
- Subject digest: 2d3fd31671e0ab8107a4b8783080ed62930155becd9c000ae984aeee3546792d
- Sigstore transparency entry: 204513584
- Sigstore integration time: Apr 30, 2025
Source repository:
- Permalink: brannn/sql-batcher@1f2f68a681a05c9099f4f770780e78891666c356
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/brannn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1f2f68a681a05c9099f4f770780e78891666c356
- Trigger Event: release

File details

Details for the file sql_batcher-0.1.4-py3-none-any.whl.

File metadata

Download URL: sql_batcher-0.1.4-py3-none-any.whl
Upload date: Apr 30, 2025
Size: 49.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sql_batcher-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1c2355bce57be074746aa5a735bbb10743e2cd204a7a3eac1d12c6db2a115a0`
MD5	`060580949d2b2dcc94fb48fe7dcad4d6`
BLAKE2b-256	`26b9ac0885041556827d06019ba5d47cf598ce3e1def2a61bac3ee22b3ad3c02`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_batcher-0.1.4-py3-none-any.whl:

Publisher: publish.yml on brannn/sql-batcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sql_batcher-0.1.4-py3-none-any.whl
- Subject digest: f1c2355bce57be074746aa5a735bbb10743e2cd204a7a3eac1d12c6db2a115a0
- Sigstore transparency entry: 204513587
- Sigstore integration time: Apr 30, 2025
Source repository:
- Permalink: brannn/sql-batcher@1f2f68a681a05c9099f4f770780e78891666c356
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/brannn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1f2f68a681a05c9099f4f770780e78891666c356
- Trigger Event: release

sql-batcher 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SQL Batcher

Why SQL Batcher?

Key Features

Installation

Quick Start

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance