Skip to main content

Use AI vision to OCR PDF and image files to markdown.

Project description

PAR AI OCR

PyPI PyPI - Python Version
Runs on Linux | MacOS | Windows Arch x86-63 | ARM | AppleSilicon
PyPI - License

PAR AI OCR is a command-line tool that uses artificial intelligence to perform Optical Character Recognition (OCR) on PDF files and images. It extracts text from the input files and generates markdown output.

"Buy Me A Coffee"

Screenshots

PAR Scrape Screenshot

Features

  • Extracts text for PDFs and images to Markdown while preserving as much formatting as possible.
  • Works with most providers and vision models (quality will vary depending on provider and model used)
  • Uses my PAR AI Core

Known Issues

  • Providers other than OpenAI and Anthropic are hit-and-miss depending on provider / model / data being extracted.

Prerequisites

Install poppler (Used for PDF processing)

Linux

apt install poppler-utils

Mac

brew install poppler

Windows

scoop install poppler

uv is recommended

Linux and Mac

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Installation

Installation From Source

Clone the repository and install the package:

git clone https://github.com/paulrobello/par_ocr.git
cd par_ocr
uv sync

From PiPy

uv tool install par_ocr

Usage

Basic usage from source:

uv run par_ocr

Basic usage if installed:

par_ocr

Command Line Parameters

  • --ai-provider, -a: AI provider to use for processing [Ollama|LlamaCpp|OpenAI|Groq|XAI|Anthropic|Google|Bedrock|Github|Mistral] (default: OpenAI)
  • --model, -m: AI model to use for processing (default: provider-specific)
  • --ai-base-url, -b: Override the base URL for the AI provider
  • --system-prompt-file, -p: File containing custom system prompt, if you want to use one other than the default
  • --input-file, -i: File to process, supported extensions: .pdf, .png, .jpg
  • --pricing, -p: Configure pricing summary display [none|price|details] (default: price)
  • --pages: Comma-separated page numbers or hyphen-separated range (e.g., '1,3,5-7')
  • --output, -o: Output directory for markdown files (default same folder as input file)
  • --debug, -D: Output extra debug info (Default: false)
  • --version, -v: Show version information and exit

Examples

Note: If running from source prepend "uv run" to the beginning of the example commands.

  1. Process a PDF file using the default settings:

    par_ocr --input-file path/to/your/file.pdf
    
  2. Use a specific AI provider and model:

    par_ocr --ai-provider ANTHROPIC --model claude-3-5-sonnet-20241022 --input-file path/to/your/file.pdf
    
  3. Process specific pages of a PDF:

    par_ocr --input-file path/to/your/file.pdf --pages 1,3,5-7
    
  4. Specify an output directory:

    par_ocr --input-file path/to/your/file.pdf --output path/to/output/directory
    
  5. Enable pricing details:

    par_ocr --pricing details --input-file path/to/your/file.pdf 
    

Note

Make sure to set the appropriate environment variables for the AI provider you're using (e.g., OPENAI_API_KEY for OpenAI). you may also create a file ~/.par_ocr_config with your API Keys such as:

# AI API KEYS
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
GROQ_API_KEY=
XAI_API_KEY=
GOOGLE_API_KEY=
MISTRAL_API_KEY=
GITHUB_TOKEN=
OPENROUTER_API_KEY=
# Used by Bedrock
AWS_PROFILE=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=


### Tracing (optional)
LANGCHAIN_TRACING_V2=false
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=par_ocr

AI API KEYS

Open AI Compatible Providers

If a specify provider is not listed but has an OpenAI compatible endpoint you can use the following combo of vars:

  • PARAI_AI_PROVIDER=OpenAI
  • PARAI_MODEL=Your selected model
  • PARAI_AI_BASE_URL=The providers OpenAI endpoint URL

Whats New

  • Version 0.2.0:
    • Updated ai lib and other dependencies
    • Added debug flag
  • Version 0.1.1:
    • Updated ai lib
    • Fixed markdown fences
  • Version 0.1.0:
    • Initial release

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Paul Robello - probello@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

par_ocr-0.2.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

par_ocr-0.2.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file par_ocr-0.2.0.tar.gz.

File metadata

  • Download URL: par_ocr-0.2.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for par_ocr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fe2bb2622e1cba55796ae9fd5a3c7b48e2a4137ba158352bc09498c3dfb847c9
MD5 4e4f2784a0de0c370e0c381cae554022
BLAKE2b-256 ad052907457c01b4b4387ceaab1f24489a5f6d28fccdc8ff2cc3ef359a7d018b

See more details on using hashes here.

Provenance

The following attestation bundles were made for par_ocr-0.2.0.tar.gz:

Publisher: publish.yml on paulrobello/par_ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file par_ocr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: par_ocr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for par_ocr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5234250417f8712296eb9f5d9bf3adcc3e578779861bcb5844bb17b19b9bf61e
MD5 bd11deda3a2d2b1980440393f5358a68
BLAKE2b-256 65801513ca6c7d4bf96d639b74b15d20cc720bb97a8c3cc21bcb15307e4a9210

See more details on using hashes here.

Provenance

The following attestation bundles were made for par_ocr-0.2.0-py3-none-any.whl:

Publisher: publish.yml on paulrobello/par_ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page