Skip to main content

Universal document and PDF toolkit - Convert, compress, edit, and process documents

Project description

Shift - Universal Document and PDF Toolkit

A comprehensive command-line toolkit for document conversion, PDF compression, page management, OCR text extraction, and more.

🚀 Quick Start

Install the package:

git clone https://github.com/adamn1225/shift.git
cd shift
pip install -e .

Use anywhere:

shift-convert document.docx --to pdf            # Recommended (avoids bash builtin)
shift-compress large_file.pdf                   # Compress PDFs for email
shift-pages document.pdf                        # Interactive page removal  
shift-edit document.pdf --pages                 # Advanced PDF editing
shift-ocr scanned.pdf --extract-text            # Extract text from scanned PDFs

Important: Use shift-convert instead of shift to avoid conflicts with the bash builtin command. Alternatively, use the full path: /home/bender/.local/bin/shift

📦 What's Included

Command Description Main Use Case
shift Universal document converter Convert between PDF, Word, HTML, Markdown, Text
shift-compress PDF compression tool Make PDFs small enough for email attachments
shift-pages PDF page manager Remove pages interactively to reduce file size
shift-edit Advanced PDF editor Complex PDF editing with GUI interface
shift-ocr OCR text extraction Extract text from scanned PDFs and images

🔧 Features

  • Global commands: Work from any directory after installation
  • Auto-detection: File formats detected from extensions
  • Batch processing: Handle entire folders with single commands
  • Quality options: Multiple compression and conversion levels
  • External tools: Integrates with Pandoc, LibreOffice, Ghostscript when available
  • Interactive modes: GUI and command-line interfaces
  • Comprehensive help: Each tool provides detailed --help

📄 Document Conversion (shift)

Convert between various document formats with intelligent format detection.

Supported Formats

  • PDF ↔ Text, HTML, Markdown
  • Word (DOCX) ↔ PDF, HTML, Text, Markdown
  • HTML ↔ PDF, Text, Markdown
  • Markdown ↔ HTML, PDF, Word
  • Text ↔ PDF, HTML, Markdown

Examples

# Basic conversion
shift document.docx --to pdf
shift report.md --to html --css professional.css
shift presentation.html --to pdf

# Batch conversion
shift documents/ --batch --from docx --to pdf --output converted/

# Advanced options
shift file.pdf --to text --output extracted.txt
shift *.md --to html --css bootstrap.min.css

🗜️ PDF Compression (shift-compress)

Compress PDFs for email attachments (under 9.5MB) with multiple quality options.

Basic Compression

shift-compress document.pdf                 # Compress to under 9.5MB
shift-compress large_file.pdf --output small.pdf
shift-compress --batch folder/              # Process whole folders  

Advanced Compression Options

# Quality levels (using Ghostscript if available)
shift-compress file.pdf --quality screen    # Smallest size, lowest quality
shift-compress file.pdf --quality ebook     # Good balance (default)
shift-compress file.pdf --quality printer   # High quality

# Custom settings
shift-compress file.pdf --dpi 72 --jpeg-quality 50  # Maximum compression
shift-compress file.pdf --dual              # Create both quality & small versions

Two-Step Approach for Large Files

For very large PDFs (>30MB), combine page removal with compression:

shift-pages huge_file.pdf                   # Remove unnecessary pages first  
shift-compress huge_file_edited.pdf         # Then compress the result

📖 PDF Page Management (shift-pages)

Analyze and remove pages from PDFs to reduce file size.

Interactive Mode

shift-pages document.pdf                    # Interactive page selection

Direct Commands

shift-pages document.pdf --analyze          # Just show page analysis
shift-pages document.pdf --remove 1,3,5-7   # Remove specific pages
shift-pages document.pdf --split-pages      # Split into individual files

What It Shows

  • File size and page count
  • Pages with heavy image content
  • Size estimates for each page
  • Suggestions for pages to remove

✏️ Advanced PDF Editor (shift-edit)

Comprehensive PDF editing with both command-line and GUI interfaces.

Interactive Editing

shift-edit document.pdf --pages             # Interactive page selection
shift-edit document.pdf --images            # Image removal (experimental)

Direct Commands

shift-edit document.pdf --remove-pages 3,5,7-9
shift-edit document.pdf --keep-pages 1-5,10  
shift-edit document.pdf --split-pages        # Split into individual pages

Analysis Mode

shift-edit document.pdf --analyze           # Detailed structure analysis

🔍 OCR Text Extraction (shift-ocr)

Extract text from scanned PDFs and images using Tesseract OCR.

Basic OCR

shift-ocr scanned_document.pdf              # Extract text to console
shift-ocr document.pdf --output text.txt    # Save to file
shift-ocr image.png --lang eng+spa          # Multiple languages

Batch Processing

shift-ocr folder/ --batch --output results/ # Process entire folders
shift-ocr *.pdf --confidence 70             # Set confidence threshold

Preprocessing Options

shift-ocr blurry.pdf --denoise --deskew     # Clean up image quality
shift-ocr document.pdf --preprocess aggressive

🛠️ Installation and Dependencies

Python Package Installation

git clone https://github.com/adamn1225/shift.git
cd shift  
pip install -e .                            # Editable/development install
# OR
pip install .                               # Standard install

System Dependencies (Optional but Recommended)

For enhanced functionality, install these system tools:

Ubuntu/Debian:

sudo apt-get install ghostscript pandoc wkhtmltopdf tesseract-ocr qpdf
sudo apt-get install libreoffice-writer    # For advanced document conversion

macOS:

brew install ghostscript pandoc wkhtmltopdf tesseract qpdf

Windows:

What Each Dependency Enables

  • Ghostscript: Best PDF compression (essential for large files)
  • Pandoc: Universal document conversion between many formats
  • wkhtmltopdf: High-quality HTML to PDF conversion
  • Tesseract: OCR text extraction from scanned documents
  • qpdf: Additional PDF optimization options
  • LibreOffice: Advanced document format support

📋 Usage Examples

Common Workflows

Make a large PDF email-friendly:

shift-compress presentation.pdf --quality ebook

Convert and compress a Word document:

shift report.docx --to pdf
shift-compress report.pdf

Clean up a scanned document:

shift-ocr scanned.pdf --output clean_text.txt
shift-pages scanned.pdf                     # Remove blank pages

Batch process documents:

shift documents/ --batch --from docx --to pdf
shift-compress *.pdf --batch

Real-World Examples

Research Paper Workflow:

# Convert markdown to formatted PDF
shift paper.md --to pdf --css professional.css

# If too large for submission
shift-compress paper.pdf --quality printer

Business Document Processing:

# Convert presentations and compress for email
shift *.pptx --to pdf
shift-compress *.pdf --quality ebook --batch

Legal Document Management:

# OCR scanned contracts  
shift-ocr contracts/ --batch --output text_versions/

# Remove sensitive pages
shift-pages contract.pdf --remove 3,7-9

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Submit a Pull Request

📝 License

MIT License - see LICENSE file for details

🐛 Issues

Report bugs and request features at: https://github.com/adamn1225/shift/issues


Made with ❤️ for document processing efficiency

  1. First, remove heavy pages:
pdf-pages large_file.pdf --analyze          # See page breakdown
pdf-pages large_file.pdf                    # Interactive page removal
  1. Then compress the result:
pdf-compress edited_file.pdf --dual         # Compress the page-reduced version

Example Results:

  • Original: 47MB → Page-reduced: 32MB → Final: 13MB ✓

PDF Page Management

Analyze PDF structure and remove pages to reduce file size:

pdf-pages document.pdf --analyze            # Show page breakdown
pdf-pages document.pdf                      # Interactive page removal
pdf-pages document.pdf --remove 1,3,5-7     # Remove specific pages

The analyzer shows which pages have the most images and estimated size impact.


Document Conversion

Convert a Word document to PDF:

doc-convert document.docx --to pdf

Convert a Markdown file to HTML with a custom stylesheet:

doc-convert report.md --to html --css style.css

Extract text from a PDF file:

doc-convert file.pdf --to text --output extracted.txt

Batch convert all Word documents in a folder to PDF:

doc-convert folder/ --batch --from docx --to pdf --output converted/

Summary

You now have a complete PDF management toolkit:

  1. For regular PDFs: Use pdf-compress --dual to create both quality and email versions
  2. For large PDFs: Use pdf-pages first to remove heavy pages, then compress
  3. For document conversion: Use doc-convert between formats

All tools work from anywhere in your terminal and provide detailed help with -h or --help.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shift_cli-1.0.6.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shift_cli-1.0.6-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file shift_cli-1.0.6.tar.gz.

File metadata

  • Download URL: shift_cli-1.0.6.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shift_cli-1.0.6.tar.gz
Algorithm Hash digest
SHA256 39a0a0e9d7f069498edbfb105b3f781614f1e35cd0bf2970b02e150f46becd77
MD5 878318205aa3415d04a19308e6b7d354
BLAKE2b-256 320877381fa45b04ce79886fa8fe36f4a528cbbefb9625bb5222d1a1881824bb

See more details on using hashes here.

File details

Details for the file shift_cli-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: shift_cli-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shift_cli-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 33cef895697847cd95bb65408b56e3099f39c95e098707595f944683ea5ef269
MD5 c685cc905e91f93a894b1294a96a67d3
BLAKE2b-256 b8de9b09b2eafd77f5c239595a4fdd598a771aa6ecd1a92b05078b01ed336ef0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page