Skip to main content

A command-line tool to split large SQL files into smaller chunks based on size and SQL separators.

Project description

SQL Splitter 🚀

A high-performance command-line tool designed to split massive SQL dump files into smaller, manageable chunks. Unlike simple line-based splitters, SQL Splitter respects SQL statement boundaries (using separators like ;) to ensure each chunk remains a valid SQL script.


✨ Features

  • Smart Splitting: Splits files based on size while respecting SQL statement integrity.
  • Compression: Optionally remove comments and empty lines to reduce chunk size.
  • Configurable: Define custom separators, single-line comment markers, and multi-line comment markers.
  • Fast & Efficient: Processes files line-by-line using streaming I/O, minimizing memory usage for multi-gigabyte files.

🛠 Prerequisites

  • Python: 3.13 or higher.
  • uv: Recommended for dependency management (fast, reliable).

🚀 Installation & Setup

We use uv for easy environment management.

  1. Clone the repository:

    git clone <repository-url>
    cd sql_splitter
    
  2. Sync dependencies:

    uv sync
    

📖 Usage Guide

Command Line Arguments

Run the tool using uv run psql-splitter.

Argument Long Flag Required Default Description
-f N/A Yes - Path to the source SQL file.
-n N/A Yes - Number of chunks to split the file into.
-s N/A No ; SQL statement separator.
-c N/A No -- Single-line comment character.
-m N/A No /* Multi-line comment character start.
-z N/A No False Flag to compress output (removes empty lines/comments).

Examples

Basic split into 5 chunks:

uv run psql-splitter -f big_dump.sql -n 5

Split with compression and custom separator:

uv run psql-splitter -f my_data.sql -n 3 -s "$$" -z

🧑‍💻 Developer Guide

If you are a developer joining the project, here is how you can work with the codebase.

Project Structure

.
├── main.py              # CLI Entry point
├── Makefile             # Automated task runner
├── pyproject.toml       # Project metadata and dependencies
├── src/
│   ├── splitter.py      # Core splitting logic
│   └── tests/           # Unit tests
└── README.md            # This file

Automation with Makefile

The project includes a Makefile for common tasks:

  • Run Example:

    make run
    

    Runs the splitter on a test_dump.sql file (cleans previous chunks first).

  • Run Tests:

    make test
    

    Executes the test suite using pytest.

  • Cleanup:

    make clean
    

    Removes all generated .sql chunks (files matching [0-9]*.sql).

  • Help:

    make help
    

    Lists available commands.

Running Tests Manually

You can also run tests directly via uv:

uv run pytest src/tests -v

📝 How it works

  1. Size Calculation: The tool calculates the total file size and determines a target chunk size by dividing it by -n.
  2. Streaming Read: It reads the input file line-by-line to handle extremely large files without filling up RAM.
  3. Statement Boundary: It only closes a chunk if it has exceeded the target size and the current line ends with the specified separator (-s).
  4. Compression Mode: When -z is enabled, the tool skips lines that are empty or start with the specified comment characters (-c or -m).

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psql_splitter-1.0.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psql_splitter-1.0.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file psql_splitter-1.0.0.tar.gz.

File metadata

  • Download URL: psql_splitter-1.0.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for psql_splitter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ac557ce4a51ac944253daccd608286a1c7bb31cea3073a7caf9965e0958aef67
MD5 9af1d9957eca2ce5247a6654f5c1f79d
BLAKE2b-256 04ab0e9adfd760e2ca0b02c120a7748c3c7b79c7e7d8e742fb15f953090e15d6

See more details on using hashes here.

File details

Details for the file psql_splitter-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: psql_splitter-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for psql_splitter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cfac2e68fd0f72fbf75284a83308326d3af44afb2d036bd4edb7172bb74704d
MD5 ac9edc4892cf46f41fe6f8cd70dce302
BLAKE2b-256 02340b22df1621fda86a8bb6f04ba0ea4586e1db832a3a43ba27178f48217b05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page