Skip to main content

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

Project description

Mistral OCR CLI

CI PyPI Python License pre-commit Code style: black

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

Features

  • Elegant TUI with progress bars and rich output
  • Single file or batch processing
  • Output in text, JSON, or Markdown
  • Parallel batch processing with --jobs
  • Config helper and .env support

Quickstart

  1. Install
uv tool install mistral-ocr-cli  # via pipx-like tool install
# or
uv pip install mistral-ocr-cli   # into current environment
  1. Configure API key
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
  1. Extract text
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4

Usage

ocr extract [OPTIONS] FILES...

Options:
  -o, --output PATH            Output file (single-file mode)
  -f, --format [text|json|markdown]
  -b, --batch                  Enable batch mode
  -O, --output-dir PATH        Directory for batch outputs
  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]
  -v, --verbose                Verbose logs
  -q, --quiet                  Only errors
  --version                    Show version
  --help                       Show help

Programmatic use

from ocr.pdf2text import pdf_to_text

text = pdf_to_text("/path/file.pdf")

Development

uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q

Releasing is handled via standard tags and GitHub Releases.

License

MIT

Test coverage

# Terminal report
make coverage

# HTML report in htmlcov/
make coverhtml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upspawn_ocr_cli-0.1.0b1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upspawn_ocr_cli-0.1.0b1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file upspawn_ocr_cli-0.1.0b1.tar.gz.

File metadata

  • Download URL: upspawn_ocr_cli-0.1.0b1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for upspawn_ocr_cli-0.1.0b1.tar.gz
Algorithm Hash digest
SHA256 8d4a4652d2270ae2760184b9285c909ffa82f6e4b67ee0eb17125f33e623e1c3
MD5 6824ad9497b3289870f80dfcbf4a1ca2
BLAKE2b-256 d145fa82b45f00dc87dabc2a775d071c2ef3852192296effc18dcd15611bd339

See more details on using hashes here.

File details

Details for the file upspawn_ocr_cli-0.1.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for upspawn_ocr_cli-0.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 f05b00af70ec32046e278ce10239402cd55869cdec3018e197872a8a6f40054d
MD5 98d109981b23cdbf22be1f3857d35afe
BLAKE2b-256 f706bb2ea972343df02947aa7455743467bb3f48506a517d702ead1db214c532

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page