Skip to main content

A command-line tool to split large SQL files into smaller chunks based on size and SQL separators.

Project description

SQL Splitter 🚀

A high-performance command-line tool designed to split massive SQL dump files into smaller, manageable chunks. Unlike simple line-based splitters, SQL Splitter respects SQL statement boundaries (using separators like ;) to ensure each chunk remains a valid SQL script.


✨ Features

  • Smart Splitting: Splits files based on size while respecting SQL statement integrity.
  • Compression: Optionally remove comments and empty lines to reduce chunk size.
  • Configurable: Define custom separators, single-line comment markers, and multi-line comment markers.
  • Fast & Efficient: Processes files line-by-line using streaming I/O, minimizing memory usage for multi-gigabyte files.

🛠 Prerequisites

  • Python: 3.13 or higher.
  • uv: Recommended for dependency management (fast, reliable).

🚀 Installation & Setup

We use uv for easy environment management.

  1. Clone the repository:

    git clone <repository-url>
    cd sql_splitter
    
  2. Sync dependencies:

    uv sync
    

📖 Usage Guide

Command Line Arguments

Run the tool using uv run psql-splitter.

Argument Long Flag Required Default Description
-f N/A Yes - Path to the source SQL file.
-n N/A Yes - Number of chunks to split the file into.
-s N/A No ; SQL statement separator.
-c N/A No -- Single-line comment character.
-m N/A No /* Multi-line comment character start.
-z N/A No False Flag to compress output (removes empty lines/comments).

Examples

Basic split into 5 chunks:

uv run psql-splitter -f big_dump.sql -n 5

Split with compression and custom separator:

uv run psql-splitter -f my_data.sql -n 3 -s "$$" -z

🧑‍💻 Developer Guide

If you are a developer joining the project, here is how you can work with the codebase.

Project Structure

.
├── main.py              # CLI Entry point
├── Makefile             # Automated task runner
├── pyproject.toml       # Project metadata and dependencies
├── src/
│   ├── splitter.py      # Core splitting logic
│   └── tests/           # Unit tests
└── README.md            # This file

Automation with Makefile

The project includes a Makefile for common tasks:

  • Run Example:

    make run
    

    Runs the splitter on a test_dump.sql file (cleans previous chunks first).

  • Run Tests:

    make test
    

    Executes the test suite using pytest.

  • Cleanup:

    make clean
    

    Removes all generated .sql chunks (files matching [0-9]*.sql).

  • Help:

    make help
    

    Lists available commands.

Running Tests Manually

You can also run tests directly via uv:

uv run pytest src/tests -v

📝 How it works

  1. Size Calculation: The tool calculates the total file size and determines a target chunk size by dividing it by -n.
  2. Streaming Read: It reads the input file line-by-line to handle extremely large files without filling up RAM.
  3. Statement Boundary: It only closes a chunk if it has exceeded the target size and the current line ends with the specified separator (-s).
  4. Compression Mode: When -z is enabled, the tool skips lines that are empty or start with the specified comment characters (-c or -m).

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psql_splitter-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psql_splitter-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file psql_splitter-0.1.0.tar.gz.

File metadata

  • Download URL: psql_splitter-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for psql_splitter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe5ddd214d380d669fce231ed61f3166e9418b94b09b0c058ddeba7aedf84858
MD5 7353a3118384f15903dc167e31ec4425
BLAKE2b-256 24984c4e0d35b965706aebad09bba6916c8ac4bd59c80c393f6df515d56300ad

See more details on using hashes here.

File details

Details for the file psql_splitter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: psql_splitter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for psql_splitter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9584e0452a64a025072ae97b9b3616436569c6eff8e9c338e2eb8db9e5371638
MD5 65f9a537585d6017ed55c1de7c97bd71
BLAKE2b-256 e995bb7ee4612387b7d34cb87f517a51d5e23648408916de56e98d151477ab5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page