Skip to main content

A simple script that uses the Ollama API to get the Markdown text from a PDF or image file using the DeepSeek-OCR model

Project description

DeepSeek OCR Ollama

This is a simple script that uses the Ollama API to get the Markdown text from a PDF or image file using the DeepSeek-OCR model

Usage

Install the Requirements

To install the necessary requirements, run the following command:

pip install deepseek-ocr-ollama

To be used, Ollama is required and the deepseek-ocr model must be installed

ollama pull deepseek-ocr

Typical Usage

deepseek-ocr-ollama paper.pdf
deepseek-ocr-ollama paper.pdf --dpi 200
deepseek-ocr-ollama paper.pdf -o revision
deepseek-ocr-ollama paper.pdf -e
deepseek-ocr-ollama paper.pdf -m FULL
deepseek-ocr-ollama page74.jpg -e
deepseek-ocr-ollama OLLAMA_HOST=http://gauss:11434 receipt.pdf -e
deepseek-ocr-ollama -j paper.json
deepseek-ocr-ollama -j paper.json -m TEXT_NO_PAGES -n

Arguments

Argument Description
input input PDF or image file
-d DPI --dpi DPI DPI (dots per inch) setting for the PDF to image conversion. Defaults to 600
-o OUTPUT --output OUTPUT output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file
-j JSON_OCR_RESPONSE --json-ocr-response JSON_OCR_RESPONSE path from which to load a pre-existing JSON OCR response (any input file will be ignored)
-m MODE --mode MODE mode of operation: either the name or numerical value of the mode. Defaults to FULL_NO_PAGES
-s PAGE_SEPARATOR --page-separator PAGE_SEPARATOR page separator to use when writing the Markdown file. Defaults to \n
-n --no-json do not write the JSON OCR response to a file. By default, the response is written
-e --load-dot-env load the .env file from the current directory using python-dotenv, to retrieve the Ollama environment variables
-E LOAD_PATH_DOT_ENV --load-path-dot-env LOAD_PATH_DOT_ENV load the .env file from the specified path using python-dotenv, to retrieve the Ollama environment variables. Defaults to ~/.deepseek_ocr_ollama.env
-M MODEL_NAME --model-name MODEL_NAME name of the Ollama model to use for OCR. Defaults to deepseek-ocr
-H HINT --hint HINT hint to provide to the OCR model to improve recognition accuracy. Ignored if raw prompt is set. The hint is a short instruction that will be mixed in with the main prompt
-R RAW_PROMPT --raw-prompt RAW_PROMPT raw prompt to provide to the OCR model, overriding the default prompt. Hint is ignored if this is set
-V VERBOSE --verbose VERBOSE verbosity level: 0 = silent, 1 = normal, 2 = debug. Defaults to 1

Modes

Value Name
0 FULL
1 FULL_ALT
2 FULL_NO_DIR
3 FULL_NO_PAGES
4 TEXT
5 TEXT_NO_PAGES

Given the input file paper.pdf, the directory structure for each mode is shown below:

0 - FULL

Structure

paper
├── full
│   ├── image1.png
│   ├── image2.png
│   ├── image3.png
│   └── paper.md
├── page_0
│   ├── image1.png
│   └── paper.md
├── page_1
│   ├── image2.png
│   └── paper.md
└── page_2
    ├── image3.png
    └── paper.md

1 - FULL_ALT

Structure

paper
├── image1.png
├── image2.png
├── image3.png
├── paper.md
├── page_0
│   ├── image1.png
│   └── paper.md
├── page_1
│   ├── image2.png
│   └── paper.md
└── page_2
    ├── image3.png
    └── paper.md

2 - FULL_NO_DIR

Structure

paper
├── image1.png
├── image2.png
├── image3.png
├── paper.md
├── paper0.md
├── paper1.md
└── paper2.md

3 - FULL_NO_PAGES default

Structure

paper
├── image1.png
├── image2.png
├── image3.png
└── paper.md

4 - TEXT

Structure

paper
├── paper.md
├── paper0.md
├── paper1.md
└── paper2.md

5 - TEXT_NO_PAGES

Structure

paper
└── paper.md

By default, the JSON response from the DeepSeek-OCR model is saved in the output directory. To disable JSON output, use the -n or --no-json argument. To experiment with a different mode without using additional calls, reuse an existing JSON response instead of the original input file

Ollama's Environment Variables

The Ollama server can be modified using the environment variables available from the Python API:

  • OLLAMA_HOST : Ollama server host
  • OLLAMA_API_KEY : Used as Bearer authorization token

To avoid using -e to load the .env file, you can create one at $HOME/.deepseek_ocr_ollama.env (where $HOME is your home directory). It will then be automatically loaded at the start of the script

For example, for an user called vavilov, the path would look like this:

  • Linux

    /home/vavilov/.deepseek_ocr_ollama.env  
    
  • macOS

    /Users/vavilov/.deepseek_ocr_ollama.env  
    
  • Windows

    C:\Users\vavilov\.deepseek_ocr_ollama.env  
    

and the content will be something like this:

OLLAMA_HOST=http://gauss:11434

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepseek_ocr_ollama-1.3.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepseek_ocr_ollama-1.3-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file deepseek_ocr_ollama-1.3.tar.gz.

File metadata

  • Download URL: deepseek_ocr_ollama-1.3.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for deepseek_ocr_ollama-1.3.tar.gz
Algorithm Hash digest
SHA256 657e6fa923d9ada4dbc8b674a65d9b9228480f7572b970516733078318457858
MD5 fb12ce36eae7c21db60db5183d6eac47
BLAKE2b-256 3d68b13bae0f69daaee2b2b5b86b3368f3c66e6b40ebb766e6e93564b94186a5

See more details on using hashes here.

File details

Details for the file deepseek_ocr_ollama-1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for deepseek_ocr_ollama-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c64a52bbc25e1158a667f7a5dd85940501b7dd6dd3159ad8d4b5aac39b8a30a8
MD5 7bd2ed4c7ff96b9c54714d3930ef587c
BLAKE2b-256 cdecdfa64e55aa339783fa8b013ca2d76040ac739778df292f528d018661bc71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page