A collection of intelligent file splitting tools - PDF chapters, videos, audio, and more
Project description
Lazy Splitter
A collection of intelligent file splitting tools for the lazy developer. Split PDFs, EPUBs, and more with smart chapter detection.
🚀 Current Tools
📄 PDF Splitter
Intelligently detects chapters in PDF files and splits them into separate PDF files.
📚 EPUB Splitter
Intelligently detects chapters in EPUB files and splits them into separate EPUB files.
Features
PDF Splitter
- 🔍 Smart Chapter Detection: Automatically detects chapters using PDF bookmarks/TOC or text analysis
- 📑 Multiple Detection Strategies:
- Bookmark/TOC extraction (fastest and most reliable)
- Heuristic text analysis (font size, heading patterns, "Chapter N" detection)
- Hybrid approach (combines both methods)
- 📊 Preview Mode: See detected chapters before splitting
- 🎯 Flexible Output: Customizable output directory and filename patterns
- 🚀 Progress Tracking: Rich progress bars for large files
- ⚙️ Configurable: Fine-tune detection sensitivity and patterns
EPUB Splitter
- 🔍 Smart Chapter Detection: Automatically detects chapters using native TOC, HTML structure, or manifest
- 📑 Multiple Detection Strategies:
- Native TOC extraction (EPUB 2 NCX and EPUB 3 navigation)
- Structural analysis (HTML heading tags)
- Manifest-based splitting (spine items)
- Hybrid approach (combines all methods)
- 📊 Preview Mode: See detected chapters before splitting
- 🎯 Flexible Output: Customizable output directory and filename patterns
- 📦 Resource Handling: Automatically copies referenced images, CSS, and fonts
- ⚙️ Configurable: Fine-tune detection sensitivity and TOC levels
Installation
From PyPI (recommended)
pip install lazy-splitter
From Source
git clone https://github.com/shankarpandala/lazy-splitter.git
cd lazy-splitter
pip install -e .
Usage
PDF Splitter
Split a PDF by chapters
pdf-splitter split input.pdf
Preview detected chapters without splitting
pdf-splitter preview input.pdf
Specify output directory
pdf-splitter split input.pdf -o output_dir
Choose detection strategy
# Use bookmarks only (fastest)
pdf-splitter split input.pdf --strategy bookmarks
# Use text analysis only (when bookmarks are missing)
pdf-splitter split input.pdf --strategy heuristic
# Use both methods (default)
pdf-splitter split input.pdf --strategy hybrid
Customize output filename pattern
pdf-splitter split input.pdf --pattern "{index:02d}_{title}.pdf"
EPUB Splitter
Split an EPUB by chapters
epub-splitter split ebook.epub
Preview detected chapters without splitting
epub-splitter preview ebook.epub
Specify output directory
epub-splitter split ebook.epub -o output_dir
Choose detection strategy
# Use native TOC only (fastest and most reliable)
epub-splitter split ebook.epub --strategy native
# Use HTML structure analysis (when TOC is missing)
epub-splitter split ebook.epub --strategy structural
# Use manifest-based splitting (one chapter per file)
epub-splitter split ebook.epub --strategy manifest
# Use hybrid approach (default)
epub-splitter split ebook.epub --strategy hybrid
Customize output filename pattern
epub-splitter split ebook.epub --pattern "{index:02d}_{title}.epub"
Examples
PDF Examples
# Basic usage
pdf-splitter split textbook.pdf
# Preview chapters first
pdf-splitter preview textbook.pdf
# Custom output location
pdf-splitter split textbook.pdf -o chapters/
# Force heuristic detection (for PDFs without bookmarks)
pdf-splitter split textbook.pdf --strategy heuristic --sensitivity high
EPUB Examples
# Basic usage
epub-splitter split novel.epub
# Preview chapters first
epub-splitter preview novel.epub
# Custom output location
epub-splitter split novel.epub -o chapters/
# Use structural detection (for EPUBs without TOC)
epub-splitter split novel.epub --strategy structural --sensitivity high
# Split by TOC level 2 (chapters instead of parts)
epub-splitter split textbook.epub --toc-level 2
How It Works
PDF Splitter
- Bookmark/TOC Extraction: First tries to extract chapter information from PDF bookmarks or table of contents
- Text Analysis Fallback: If bookmarks are unavailable, analyzes text for:
- Font size changes (larger fonts often indicate headings)
- Common chapter patterns ("Chapter 1", "CHAPTER ONE", etc.)
- Page breaks combined with heading-like text
- Smart Splitting: Creates individual PDF files for each detected chapter with preserved formatting and metadata
EPUB Splitter
- Native TOC Extraction: First tries to extract chapter information from EPUB navigation (nav.xhtml or toc.ncx)
- Structural Analysis Fallback: If TOC is unavailable, analyzes HTML structure for:
- Heading tags (h1, h2, h3) based on sensitivity level
- Semantic HTML structure
- Title extraction from content
- Manifest-based Fallback: Uses EPUB spine/manifest to create one chapter per content file
- Smart Splitting: Creates individual EPUB files for each detected chapter with:
- Preserved metadata and styling
- Automatically copied resources (images, CSS, fonts)
- Valid EPUB structure with regenerated manifest and spine
Requirements
- Python 3.8+
- PyMuPDF (for PDF manipulation)
- EbookLib (for EPUB manipulation)
- lxml (for HTML/XML parsing)
- Click (for CLI interface)
- Rich (for beautiful terminal output)
Development
# Install with development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
# Type checking
mypy src/
License
MIT License - see LICENSE file for details
🗺️ Roadmap
✅ Completed
- ✅ PDF Splitter - Split PDFs by chapters with smart detection
- ✅ EPUB Splitter - Split EPUBs by chapters with TOC and structural analysis
Coming Soon
- 🎬 Video Splitter - Split videos by scenes, chapters, or silence detection
- 🎵 Audio Splitter - Split audio files by silence, chapters, or time intervals
- 📊 Document Splitter - Split Word docs, presentations, and more
- 🖼️ Image Splitter - Split image collections and multi-page TIFFs
Contributing
Contributions are welcome! We're building a suite of intelligent splitting tools. Please feel free to submit a Pull Request.
See CONTRIBUTING.md for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazy_splitter-0.2.1.tar.gz.
File metadata
- Download URL: lazy_splitter-0.2.1.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afdec746ec55d29818f877ba123bf68a3e058d5e3dcde9fd6782fc993252eec8
|
|
| MD5 |
9cdf01544f992eab74dc39b57ff16539
|
|
| BLAKE2b-256 |
a5b3cf11f56fd087fda6d76d3128f8687739b4fe4d342251d0ce1685606abfb9
|
File details
Details for the file lazy_splitter-0.2.1-py3-none-any.whl.
File metadata
- Download URL: lazy_splitter-0.2.1-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6bedd93abe9f2c3d31546af377f23620c51586291828592888c81f72b48d32c
|
|
| MD5 |
f6a6f6f8ba909dc943f0e2a02aee2e51
|
|
| BLAKE2b-256 |
b4ebf19d39c91314b74f348cce950bf9d0d193c76cd73b8782b1464c5653a745
|