Universal document and PDF toolkit - Convert, compress, edit, and process documents

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Text Processing
- Utilities

Project description

Shift - Universal Document and PDF Toolkit

A comprehensive command-line toolkit for document conversion, PDF compression, page management, OCR text extraction, and more.

🚀 Quick Start

Install the package:

git clone https://github.com/adamn1225/shift.git
cd shift
pip install -e .

Use anywhere:

shift-convert document.docx --to pdf            # Recommended (avoids bash builtin)
shift-compress large_file.pdf                   # Compress PDFs for email
shift-pages document.pdf                        # Interactive page removal  
shift-edit document.pdf --pages                 # Advanced PDF editing
shift-ocr scanned.pdf --extract-text            # Extract text from scanned PDFs

Important: Use shift-convert instead of shift to avoid conflicts with the bash builtin command. Alternatively, use the full path: /home/bender/.local/bin/shift

📦 What's Included

Command	Description	Main Use Case
`shift`	Universal document converter	Convert between PDF, Word, HTML, Markdown, Text
`shift-compress`	PDF compression tool	Make PDFs small enough for email attachments
`shift-pages`	PDF page manager	Remove pages interactively to reduce file size
`shift-edit`	Advanced PDF editor	Complex PDF editing with GUI interface
`shift-ocr`	OCR text extraction	Extract text from scanned PDFs and images

🔧 Features

Global commands: Work from any directory after installation
Auto-detection: File formats detected from extensions
Batch processing: Handle entire folders with single commands
Quality options: Multiple compression and conversion levels
External tools: Integrates with Pandoc, LibreOffice, Ghostscript when available
Interactive modes: GUI and command-line interfaces
Comprehensive help: Each tool provides detailed --help

📄 Document Conversion (`shift`)

Convert between various document formats with intelligent format detection.

Supported Formats

PDF ↔ Text, HTML, Markdown
Word (DOCX) ↔ PDF, HTML, Text, Markdown
HTML ↔ PDF, Text, Markdown
Markdown ↔ HTML, PDF, Word
Text ↔ PDF, HTML, Markdown

Examples

# Basic conversion
shift document.docx --to pdf
shift report.md --to html --css professional.css
shift presentation.html --to pdf

# Batch conversion
shift documents/ --batch --from docx --to pdf --output converted/

# Advanced options
shift file.pdf --to text --output extracted.txt
shift *.md --to html --css bootstrap.min.css

🗜️ PDF Compression (`shift-compress`)

Compress PDFs for email attachments (under 9.5MB) with multiple quality options.

Basic Compression

shift-compress document.pdf                 # Compress to under 9.5MB
shift-compress large_file.pdf --output small.pdf
shift-compress --batch folder/              # Process whole folders

Advanced Compression Options

# Quality levels (using Ghostscript if available)
shift-compress file.pdf --quality screen    # Smallest size, lowest quality
shift-compress file.pdf --quality ebook     # Good balance (default)
shift-compress file.pdf --quality printer   # High quality

# Custom settings
shift-compress file.pdf --dpi 72 --jpeg-quality 50  # Maximum compression
shift-compress file.pdf --dual              # Create both quality & small versions

Two-Step Approach for Large Files

For very large PDFs (>30MB), combine page removal with compression:

shift-pages huge_file.pdf                   # Remove unnecessary pages first  
shift-compress huge_file_edited.pdf         # Then compress the result

📖 PDF Page Management (`shift-pages`)

Analyze and remove pages from PDFs to reduce file size.

Interactive Mode

shift-pages document.pdf                    # Interactive page selection

Direct Commands

shift-pages document.pdf --analyze          # Just show page analysis
shift-pages document.pdf --remove 1,3,5-7   # Remove specific pages
shift-pages document.pdf --split-pages      # Split into individual files

What It Shows

File size and page count
Pages with heavy image content
Size estimates for each page
Suggestions for pages to remove

✏️ Advanced PDF Editor (`shift-edit`)

Comprehensive PDF editing with both command-line and GUI interfaces.

Interactive Editing

shift-edit document.pdf --pages             # Interactive page selection
shift-edit document.pdf --images            # Image removal (experimental)

Direct Commands

shift-edit document.pdf --remove-pages 3,5,7-9
shift-edit document.pdf --keep-pages 1-5,10  
shift-edit document.pdf --split-pages        # Split into individual pages

Analysis Mode

shift-edit document.pdf --analyze           # Detailed structure analysis

🔍 OCR Text Extraction (`shift-ocr`)

Extract text from scanned PDFs and images using Tesseract OCR.

Basic OCR

shift-ocr scanned_document.pdf              # Extract text to console
shift-ocr document.pdf --output text.txt    # Save to file
shift-ocr image.png --lang eng+spa          # Multiple languages

Batch Processing

shift-ocr folder/ --batch --output results/ # Process entire folders
shift-ocr *.pdf --confidence 70             # Set confidence threshold

Preprocessing Options

shift-ocr blurry.pdf --denoise --deskew     # Clean up image quality
shift-ocr document.pdf --preprocess aggressive

🛠️ Installation and Dependencies

Python Package Installation

git clone https://github.com/adamn1225/shift.git
cd shift  
pip install -e .                            # Editable/development install
# OR
pip install .                               # Standard install

System Dependencies (Optional but Recommended)

For enhanced functionality, install these system tools:

Ubuntu/Debian:

sudo apt-get install ghostscript pandoc wkhtmltopdf tesseract-ocr qpdf
sudo apt-get install libreoffice-writer    # For advanced document conversion

macOS:

brew install ghostscript pandoc wkhtmltopdf tesseract qpdf

Windows:

Install Ghostscript
Install Pandoc
Install wkhtmltopdf

What Each Dependency Enables

Ghostscript: Best PDF compression (essential for large files)
Pandoc: Universal document conversion between many formats
wkhtmltopdf: High-quality HTML to PDF conversion
Tesseract: OCR text extraction from scanned documents
qpdf: Additional PDF optimization options
LibreOffice: Advanced document format support

📋 Usage Examples

Common Workflows

Make a large PDF email-friendly:

shift-compress presentation.pdf --quality ebook

Convert and compress a Word document:

shift report.docx --to pdf
shift-compress report.pdf

Clean up a scanned document:

shift-ocr scanned.pdf --output clean_text.txt
shift-pages scanned.pdf                     # Remove blank pages

Batch process documents:

shift documents/ --batch --from docx --to pdf
shift-compress *.pdf --batch

Real-World Examples

Research Paper Workflow:

# Convert markdown to formatted PDF
shift paper.md --to pdf --css professional.css

# If too large for submission
shift-compress paper.pdf --quality printer

Business Document Processing:

# Convert presentations and compress for email
shift *.pptx --to pdf
shift-compress *.pdf --quality ebook --batch

Legal Document Management:

# OCR scanned contracts  
shift-ocr contracts/ --batch --output text_versions/

# Remove sensitive pages
shift-pages contract.pdf --remove 3,7-9

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit changes: git commit -am 'Add feature'
Push to branch: git push origin feature-name
Submit a Pull Request

📝 License

MIT License - see LICENSE file for details

🐛 Issues

Report bugs and request features at: https://github.com/adamn1225/shift/issues

Made with ❤️ for document processing efficiency

First, remove heavy pages:

pdf-pages large_file.pdf --analyze          # See page breakdown
pdf-pages large_file.pdf                    # Interactive page removal

Then compress the result:

pdf-compress edited_file.pdf --dual         # Compress the page-reduced version

Example Results:

Original: 47MB → Page-reduced: 32MB → Final: 13MB ✓

PDF Page Management

Analyze PDF structure and remove pages to reduce file size:

pdf-pages document.pdf --analyze            # Show page breakdown
pdf-pages document.pdf                      # Interactive page removal
pdf-pages document.pdf --remove 1,3,5-7     # Remove specific pages

The analyzer shows which pages have the most images and estimated size impact.

Document Conversion

Convert a Word document to PDF:

doc-convert document.docx --to pdf

Convert a Markdown file to HTML with a custom stylesheet:

doc-convert report.md --to html --css style.css

Extract text from a PDF file:

doc-convert file.pdf --to text --output extracted.txt

Batch convert all Word documents in a folder to PDF:

doc-convert folder/ --batch --from docx --to pdf --output converted/

Summary

You now have a complete PDF management toolkit:

For regular PDFs: Use pdf-compress --dual to create both quality and email versions
For large PDFs: Use pdf-pages first to remove heavy pages, then compress
For document conversion: Use doc-convert between formats

All tools work from anywhere in your terminal and provide detailed help with -h or --help.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Text Processing
- Utilities

Release history Release notifications | RSS feed

1.0.6

Sep 10, 2025

1.0.5

Sep 10, 2025

This version

1.0.4

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shift_cli-1.0.4.tar.gz (34.8 kB view details)

Uploaded Sep 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shift_cli-1.0.4-py3-none-any.whl (34.9 kB view details)

Uploaded Sep 10, 2025 Python 3

File details

Details for the file shift_cli-1.0.4.tar.gz.

File metadata

Download URL: shift_cli-1.0.4.tar.gz
Upload date: Sep 10, 2025
Size: 34.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shift_cli-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`ba626c3ccdef7c448d7297781ed215dae46515b9277e801db090487a8ec966c1`
MD5	`239c33d7da0fbbc054f281f782235d13`
BLAKE2b-256	`fa44d6e4df2e5f18e89bf05f731c50e12a898d6ab06f369fa3a539fe2293999c`

See more details on using hashes here.

File details

Details for the file shift_cli-1.0.4-py3-none-any.whl.

File metadata

Download URL: shift_cli-1.0.4-py3-none-any.whl
Upload date: Sep 10, 2025
Size: 34.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shift_cli-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59bfbaea641a9429a2e96b20815928cb5b1727a26eb11619d49c6003c61b2e9c`
MD5	`7a055abbde7b245821e9b3351a99f481`
BLAKE2b-256	`3b88bfa7ee4d37e6572f8e8944e909dffb57130409cf0102cedc981060f781d6`

See more details on using hashes here.

shift-cli 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Shift - Universal Document and PDF Toolkit

🚀 Quick Start

📦 What's Included

🔧 Features

📄 Document Conversion (shift)

Supported Formats

Examples

🗜️ PDF Compression (shift-compress)

Basic Compression

Advanced Compression Options

Two-Step Approach for Large Files

📖 PDF Page Management (shift-pages)

Interactive Mode

Direct Commands

What It Shows

✏️ Advanced PDF Editor (shift-edit)

Interactive Editing

Direct Commands

Analysis Mode

🔍 OCR Text Extraction (shift-ocr)

Basic OCR

Batch Processing

Preprocessing Options

🛠️ Installation and Dependencies

Python Package Installation

System Dependencies (Optional but Recommended)

What Each Dependency Enables

📋 Usage Examples

Common Workflows

Real-World Examples

🤝 Contributing

📝 License

🐛 Issues

PDF Page Management

Document Conversion

Summary

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

📄 Document Conversion (`shift`)

🗜️ PDF Compression (`shift-compress`)

📖 PDF Page Management (`shift-pages`)

✏️ Advanced PDF Editor (`shift-edit`)

🔍 OCR Text Extraction (`shift-ocr`)