A collection of intelligent file splitting tools - PDF chapters, videos, audio, and more
Project description
Lazy Splitter
A collection of intelligent file splitting tools for the lazy developer. Currently featuring PDF chapter detection and splitting, with more formats coming soon.
🚀 Current Tools
📄 PDF Splitter
Intelligently detects chapters in PDF files and splits them into separate PDF files.
Features
- 🔍 Smart Chapter Detection: Automatically detects chapters using PDF bookmarks/TOC or text analysis
- 📑 Multiple Detection Strategies:
- Bookmark/TOC extraction (fastest and most reliable)
- Heuristic text analysis (font size, heading patterns, "Chapter N" detection)
- Hybrid approach (combines both methods)
- 📊 Preview Mode: See detected chapters before splitting
- 🎯 Flexible Output: Customizable output directory and filename patterns
- 🚀 Progress Tracking: Rich progress bars for large files
- ⚙️ Configurable: Fine-tune detection sensitivity and patterns
Installation
From PyPI (recommended)
pip install lazy-splitter
From Source
git clone https://github.com/shankarpandala/lazy-splitter.git
cd lazy-splitter
pip install -e .
Usage
Split a PDF by chapters
pdf-splitter split input.pdf
Preview detected chapters without splitting
pdf-splitter preview input.pdf
Specify output directory
pdf-splitter split input.pdf -o output_dir
Choose detection strategy
# Use bookmarks only (fastest)
pdf-splitter split input.pdf --strategy bookmarks
# Use text analysis only (when bookmarks are missing)
pdf-splitter split input.pdf --strategy heuristic
# Use both methods (default)
pdf-splitter split input.pdf --strategy hybrid
Customize output filename pattern
pdf-splitter split input.pdf --pattern "{index:02d}_{title}.pdf"
Examples
# Basic usage
pdf-splitter split textbook.pdf
# Preview chapters first
pdf-splitter preview textbook.pdf
# Custom output location
pdf-splitter split textbook.pdf -o chapters/
# Force heuristic detection (for PDFs without bookmarks)
pdf-splitter split textbook.pdf --strategy heuristic --sensitivity high
How It Works
- Bookmark/TOC Extraction: First tries to extract chapter information from PDF bookmarks or table of contents
- Text Analysis Fallback: If bookmarks are unavailable, analyzes text for:
- Font size changes (larger fonts often indicate headings)
- Common chapter patterns ("Chapter 1", "CHAPTER ONE", etc.)
- Page breaks combined with heading-like text
- Smart Splitting: Creates individual PDF files for each detected chapter with preserved formatting and metadata
Requirements
- Python 3.8+
- PyMuPDF (for PDF manipulation)
- Click (for CLI interface)
- Rich (for beautiful terminal output)
Development
# Install with development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
# Type checking
mypy src/
License
MIT License - see LICENSE file for details
🗺️ Roadmap
Coming Soon
- 🎬 Video Splitter - Split videos by scenes, chapters, or silence detection
- 🎵 Audio Splitter - Split audio files by silence, chapters, or time intervals
- 📊 Document Splitter - Split Word docs, presentations, and more
- 🖼️ Image Splitter - Split image collections and multi-page TIFFs
Contributing
Contributions are welcome! We're building a suite of intelligent splitting tools. Please feel free to submit a Pull Request.
See CONTRIBUTING.md for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazy_splitter-0.1.0.tar.gz.
File metadata
- Download URL: lazy_splitter-0.1.0.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6a6236b429d286381b4b8fc52ade4e8cb989340208daa1d19a2bd816aeaee4
|
|
| MD5 |
c46f263afcd2647b8a122b46a3647e18
|
|
| BLAKE2b-256 |
098cddeecb4dc752b61c06c8a8a024b86c1aa3d9df4eed2bad11c5351019f661
|
File details
Details for the file lazy_splitter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lazy_splitter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83a09fa95dd45cc6a6e337a0b5d3905c9e3b78ed902baf8c448b37b19189c142
|
|
| MD5 |
583f9ae9c02098d3b5c3acfccfad5839
|
|
| BLAKE2b-256 |
a2d48f8cff0621c829085add34f084f84e45f9d8d7abad4260393dadb3c009d8
|