Skip to main content

Document page extraction tool powered by DeepSeek-OCR

Project description

doc-page-extractor

Document page extraction tool powered by DeepSeek-OCR.

Installation

⚠️ Important: This package requires PyTorch with CUDA support (GPU Required). PyTorch is NOT automatically installed - you must install it manually first.

Step 1: Install PyTorch with CUDA

Choose the command that matches your CUDA version:

# For CUDA 12.1 (recommended for most users)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.6
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

💡 Don't know your CUDA version? Run nvidia-smi to check, or just try CUDA 12.1 (works with most recent drivers).

Step 2: Install doc-page-extractor

pip install doc-page-extractor

Verify Installation

Check if everything is working:

python -c "import doc_page_extractor; import torch; print('✓ Installation successful!'); print('✓ CUDA available:', torch.cuda.is_available())"

Expected output:

✓ Installation successful!
✓ CUDA available: True

If CUDA shows False, see the troubleshooting section below.

Usage

from doc_page_extractor import PageExtractor

# Your code here

Troubleshooting

"PyTorch is required but not installed!"

Install PyTorch first:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

"CUDA is not available!"

Check your GPU driver:

nvidia-smi

If the command fails, you need to install NVIDIA drivers:

If it succeeds, you might have CPU-only PyTorch. Reinstall with CUDA:

pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Requirements

  • Python >= 3.10, < 3.14
  • NVIDIA GPU with CUDA 11.8 or 12.1 support (Required)
  • Sufficient GPU memory (recommended: 4GB+ VRAM)

Development

For contributors and developers, see Development Guide for:

  • Running tests
  • Running lint checks
  • Building the package

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_page_extractor-1.0.6.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc_page_extractor-1.0.6-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file doc_page_extractor-1.0.6.tar.gz.

File metadata

  • Download URL: doc_page_extractor-1.0.6.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/25.1.0

File hashes

Hashes for doc_page_extractor-1.0.6.tar.gz
Algorithm Hash digest
SHA256 d57b023a4891d8d30fcd1ded394d20c54f63f2df8550a0816bb827be864eec99
MD5 8babdfbef32cee23442dd6a52dbee068
BLAKE2b-256 4e9059e7d5efa2c8ce5dd4f56794edf5f9e0084a0a7c716ed3e086d7b7fb1073

See more details on using hashes here.

File details

Details for the file doc_page_extractor-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: doc_page_extractor-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/25.1.0

File hashes

Hashes for doc_page_extractor-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 66a766e9fde3e3f5213c2314907c42df1d9299b93b6c536f5cd69c3f2fa8e862
MD5 570481044b16cf8c429f987e901d450f
BLAKE2b-256 3db01a5f8293b2960097060a9bad7ac00da9e56310a0d4c11c69f11e06b3c445

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page