Document page extraction tool powered by DeepSeek-OCR

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

doc-page-extractor

Document page extraction tool powered by DeepSeek-OCR.

Installation

⚠️ Important: This package requires PyTorch with CUDA support (GPU Required). PyTorch is NOT automatically installed - you must install it manually first.

Step 1: Install PyTorch with CUDA

Choose the command that matches your CUDA version:

# For CUDA 12.1 (recommended for most users)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.6
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126

💡 Don't know your CUDA version? Run nvidia-smi to check, or just try CUDA 12.1 (works with most recent drivers).

Step 2: Install doc-page-extractor

pip install doc-page-extractor

Verify Installation

Check if everything is working:

python -c "import doc_page_extractor; import torch; print('✓ Installation successful!'); print('✓ CUDA available:', torch.cuda.is_available())"

Expected output:

✓ Installation successful!
✓ CUDA available: True

If CUDA shows False, see the troubleshooting section below.

Usage

from doc_page_extractor import PageExtractor

# Your code here

Troubleshooting

"PyTorch is required but not installed!"

Install PyTorch first:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

"CUDA is not available!"

Check your GPU driver:

nvidia-smi

If the command fails, you need to install NVIDIA drivers:

Download from: https://www.nvidia.com/download/index.aspx

If it succeeds, you might have CPU-only PyTorch. Reinstall with CUDA:

pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Requirements

Python >= 3.10, < 3.14
NVIDIA GPU with CUDA 11.8 or 12.1 support (Required)
Sufficient GPU memory (recommended: 4GB+ VRAM)

Dependencies & Licenses

This project is licensed under the MIT License. It depends on the DeepSeek-OCR model which uses easydict (LGPLv3) for configuration management.

Development

For contributors and developers, see Development Guide for:

Running tests
Running lint checks
Building the package

Project details

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.0.12

Dec 20, 2025

This version

1.0.11

Dec 15, 2025

1.0.10

Dec 11, 2025

1.0.9

Dec 1, 2025

1.0.8

Nov 28, 2025

1.0.7

Nov 21, 2025

1.0.6

Nov 17, 2025

1.0.5

Nov 15, 2025

1.0.4

Nov 15, 2025

1.0.3

Nov 12, 2025

1.0.2

Nov 11, 2025

1.0.1

Nov 6, 2025

1.0.0

Nov 4, 2025

0.2.4

Jul 16, 2025

0.2.3

Jun 24, 2025

0.2.2

Jun 6, 2025

0.2.1

Jun 3, 2025

0.2.0

May 13, 2025

0.1.2

May 7, 2025

0.1.1

Apr 22, 2025

0.1.0

Apr 12, 2025

0.0.10

Apr 11, 2025

0.0.9

Apr 8, 2025

0.0.8

Apr 5, 2025

0.0.7

Mar 28, 2025

0.0.6

Mar 12, 2025

0.0.5

Mar 11, 2025

0.0.2

Feb 19, 2025

0.0.1

Feb 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc_page_extractor-1.0.11.tar.gz (12.3 kB view details)

Uploaded Dec 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doc_page_extractor-1.0.11-py3-none-any.whl (15.2 kB view details)

Uploaded Dec 15, 2025 Python 3

File details

Details for the file doc_page_extractor-1.0.11.tar.gz.

File metadata

Download URL: doc_page_extractor-1.0.11.tar.gz
Upload date: Dec 15, 2025
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/25.1.0

File hashes

Hashes for doc_page_extractor-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`7a0b29e0d1ea76dcc20b3e6d8075b59e4f826e8cb937edff58cbd3456748f855`
MD5	`09d56eefa09efcdf9c11895ae78a5b59`
BLAKE2b-256	`99bcd21236f602be9904f6aaabb99c1433e0c4535f3440b8f6a4fbc88b163adc`

See more details on using hashes here.

File details

Details for the file doc_page_extractor-1.0.11-py3-none-any.whl.

File metadata

Download URL: doc_page_extractor-1.0.11-py3-none-any.whl
Upload date: Dec 15, 2025
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/25.1.0

File hashes

Hashes for doc_page_extractor-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a09de0c36f72a83afd17593244d571679c8d495549ec634fbcd4fa2c7a97834e`
MD5	`4388907e3fb32e3055c7175db97c9444`
BLAKE2b-256	`9e4fe6204c2a5ccc8ecf9cc452468fe27c88b202e85c4469b3cfb7396834a7df`

See more details on using hashes here.

doc-page-extractor 1.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

doc-page-extractor

Installation

Step 1: Install PyTorch with CUDA

Step 2: Install doc-page-extractor

Verify Installation

Usage

Troubleshooting

"PyTorch is required but not installed!"

"CUDA is not available!"

Requirements

Dependencies & Licenses

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes