Universal document and PDF toolkit - Convert, compress, edit, and process documents
Project description
Shift - Universal Document and PDF Toolkit
A comprehensive command-line toolkit for document conversion, PDF compression, page management, OCR text extraction, and more.
🚀 Quick Start
Install the package:
git clone https://github.com/adamn1225/shift.git
cd shift
pip install -e .
Use anywhere:
shift-convert document.docx --to pdf # Recommended (avoids bash builtin)
shift-compress large_file.pdf # Compress PDFs for email
shift-pages document.pdf # Interactive page removal
shift-edit document.pdf --pages # Advanced PDF editing
shift-ocr scanned.pdf --extract-text # Extract text from scanned PDFs
Important: Use
shift-convertinstead ofshiftto avoid conflicts with the bash builtin command. Alternatively, use the full path:/home/bender/.local/bin/shift
📦 What's Included
| Command | Description | Main Use Case |
|---|---|---|
shift |
Universal document converter | Convert between PDF, Word, HTML, Markdown, Text |
shift-compress |
PDF compression tool | Make PDFs small enough for email attachments |
shift-pages |
PDF page manager | Remove pages interactively to reduce file size |
shift-edit |
Advanced PDF editor | Complex PDF editing with GUI interface |
shift-ocr |
OCR text extraction | Extract text from scanned PDFs and images |
🔧 Features
- Global commands: Work from any directory after installation
- Auto-detection: File formats detected from extensions
- Batch processing: Handle entire folders with single commands
- Quality options: Multiple compression and conversion levels
- External tools: Integrates with Pandoc, LibreOffice, Ghostscript when available
- Interactive modes: GUI and command-line interfaces
- Comprehensive help: Each tool provides detailed
--help
📄 Document Conversion (shift)
Convert between various document formats with intelligent format detection.
Supported Formats
- PDF ↔ Text, HTML, Markdown
- Word (DOCX) ↔ PDF, HTML, Text, Markdown
- HTML ↔ PDF, Text, Markdown
- Markdown ↔ HTML, PDF, Word
- Text ↔ PDF, HTML, Markdown
Examples
# Basic conversion
shift document.docx --to pdf
shift report.md --to html --css professional.css
shift presentation.html --to pdf
# Batch conversion
shift documents/ --batch --from docx --to pdf --output converted/
# Advanced options
shift file.pdf --to text --output extracted.txt
shift *.md --to html --css bootstrap.min.css
🗜️ PDF Compression (shift-compress)
Compress PDFs for email attachments (under 9.5MB) with multiple quality options.
Basic Compression
shift-compress document.pdf # Compress to under 9.5MB
shift-compress large_file.pdf --output small.pdf
shift-compress --batch folder/ # Process whole folders
Advanced Compression Options
# Quality levels (using Ghostscript if available)
shift-compress file.pdf --quality screen # Smallest size, lowest quality
shift-compress file.pdf --quality ebook # Good balance (default)
shift-compress file.pdf --quality printer # High quality
# Custom settings
shift-compress file.pdf --dpi 72 --jpeg-quality 50 # Maximum compression
shift-compress file.pdf --dual # Create both quality & small versions
Two-Step Approach for Large Files
For very large PDFs (>30MB), combine page removal with compression:
shift-pages huge_file.pdf # Remove unnecessary pages first
shift-compress huge_file_edited.pdf # Then compress the result
📖 PDF Page Management (shift-pages)
Analyze and remove pages from PDFs to reduce file size.
Interactive Mode
shift-pages document.pdf # Interactive page selection
Direct Commands
shift-pages document.pdf --analyze # Just show page analysis
shift-pages document.pdf --remove 1,3,5-7 # Remove specific pages
shift-pages document.pdf --split-pages # Split into individual files
What It Shows
- File size and page count
- Pages with heavy image content
- Size estimates for each page
- Suggestions for pages to remove
✏️ Advanced PDF Editor (shift-edit)
Comprehensive PDF editing with both command-line and GUI interfaces.
Interactive Editing
shift-edit document.pdf --pages # Interactive page selection
shift-edit document.pdf --images # Image removal (experimental)
Direct Commands
shift-edit document.pdf --remove-pages 3,5,7-9
shift-edit document.pdf --keep-pages 1-5,10
shift-edit document.pdf --split-pages # Split into individual pages
Analysis Mode
shift-edit document.pdf --analyze # Detailed structure analysis
🔍 OCR Text Extraction (shift-ocr)
Extract text from scanned PDFs and images using Tesseract OCR.
Basic OCR
shift-ocr scanned_document.pdf # Extract text to console
shift-ocr document.pdf --output text.txt # Save to file
shift-ocr image.png --lang eng+spa # Multiple languages
Batch Processing
shift-ocr folder/ --batch --output results/ # Process entire folders
shift-ocr *.pdf --confidence 70 # Set confidence threshold
Preprocessing Options
shift-ocr blurry.pdf --denoise --deskew # Clean up image quality
shift-ocr document.pdf --preprocess aggressive
🛠️ Installation and Dependencies
Python Package Installation
git clone https://github.com/adamn1225/shift.git
cd shift
pip install -e . # Editable/development install
# OR
pip install . # Standard install
System Dependencies (Optional but Recommended)
For enhanced functionality, install these system tools:
Ubuntu/Debian:
sudo apt-get install ghostscript pandoc wkhtmltopdf tesseract-ocr qpdf
sudo apt-get install libreoffice-writer # For advanced document conversion
macOS:
brew install ghostscript pandoc wkhtmltopdf tesseract qpdf
Windows:
- Install Ghostscript
- Install Pandoc
- Install wkhtmltopdf
What Each Dependency Enables
- Ghostscript: Best PDF compression (essential for large files)
- Pandoc: Universal document conversion between many formats
- wkhtmltopdf: High-quality HTML to PDF conversion
- Tesseract: OCR text extraction from scanned documents
- qpdf: Additional PDF optimization options
- LibreOffice: Advanced document format support
📋 Usage Examples
Common Workflows
Make a large PDF email-friendly:
shift-compress presentation.pdf --quality ebook
Convert and compress a Word document:
shift report.docx --to pdf
shift-compress report.pdf
Clean up a scanned document:
shift-ocr scanned.pdf --output clean_text.txt
shift-pages scanned.pdf # Remove blank pages
Batch process documents:
shift documents/ --batch --from docx --to pdf
shift-compress *.pdf --batch
Real-World Examples
Research Paper Workflow:
# Convert markdown to formatted PDF
shift paper.md --to pdf --css professional.css
# If too large for submission
shift-compress paper.pdf --quality printer
Business Document Processing:
# Convert presentations and compress for email
shift *.pptx --to pdf
shift-compress *.pdf --quality ebook --batch
Legal Document Management:
# OCR scanned contracts
shift-ocr contracts/ --batch --output text_versions/
# Remove sensitive pages
shift-pages contract.pdf --remove 3,7-9
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a Pull Request
📝 License
MIT License - see LICENSE file for details
🐛 Issues
Report bugs and request features at: https://github.com/adamn1225/shift/issues
Made with ❤️ for document processing efficiency
- First, remove heavy pages:
pdf-pages large_file.pdf --analyze # See page breakdown
pdf-pages large_file.pdf # Interactive page removal
- Then compress the result:
pdf-compress edited_file.pdf --dual # Compress the page-reduced version
Example Results:
- Original: 47MB → Page-reduced: 32MB → Final: 13MB ✓
PDF Page Management
Analyze PDF structure and remove pages to reduce file size:
pdf-pages document.pdf --analyze # Show page breakdown
pdf-pages document.pdf # Interactive page removal
pdf-pages document.pdf --remove 1,3,5-7 # Remove specific pages
The analyzer shows which pages have the most images and estimated size impact.
Document Conversion
Convert a Word document to PDF:
doc-convert document.docx --to pdf
Convert a Markdown file to HTML with a custom stylesheet:
doc-convert report.md --to html --css style.css
Extract text from a PDF file:
doc-convert file.pdf --to text --output extracted.txt
Batch convert all Word documents in a folder to PDF:
doc-convert folder/ --batch --from docx --to pdf --output converted/
Summary
You now have a complete PDF management toolkit:
- For regular PDFs: Use
pdf-compress --dualto create both quality and email versions - For large PDFs: Use
pdf-pagesfirst to remove heavy pages, then compress - For document conversion: Use
doc-convertbetween formats
All tools work from anywhere in your terminal and provide detailed help with -h or --help.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shift_cli-1.0.4.tar.gz.
File metadata
- Download URL: shift_cli-1.0.4.tar.gz
- Upload date:
- Size: 34.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba626c3ccdef7c448d7297781ed215dae46515b9277e801db090487a8ec966c1
|
|
| MD5 |
239c33d7da0fbbc054f281f782235d13
|
|
| BLAKE2b-256 |
fa44d6e4df2e5f18e89bf05f731c50e12a898d6ab06f369fa3a539fe2293999c
|
File details
Details for the file shift_cli-1.0.4-py3-none-any.whl.
File metadata
- Download URL: shift_cli-1.0.4-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59bfbaea641a9429a2e96b20815928cb5b1727a26eb11619d49c6003c61b2e9c
|
|
| MD5 |
7a055abbde7b245821e9b3351a99f481
|
|
| BLAKE2b-256 |
3b88bfa7ee4d37e6572f8e8944e909dffb57130409cf0102cedc981060f781d6
|