Skip to main content

Modern CLI to extract text from PDFs using Mistral cloud or local Ollama models (glm-ocr, deepseek-ocr, LightOnOCR-2).

Project description

Mistral OCR CLI

CI PyPI Python License pre-commit Code style: black

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

Features

  • Elegant TUI with progress bars and rich output
  • Single file or batch processing
  • Output in text, JSON, or Markdown
  • Parallel batch processing with --jobs
  • Config helper and .env support

Quickstart

  1. Install
uv tool install mistral-ocr-cli  # via pipx-like tool install
# or
uv pip install mistral-ocr-cli   # into current environment
  1. Configure API key
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
  1. Extract text
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4

Usage

ocr extract [OPTIONS] FILES...

Options:
  -o, --output PATH            Output file (single-file mode)
  -f, --format [text|json|markdown]
  -b, --batch                  Enable batch mode
  -O, --output-dir PATH        Directory for batch outputs
  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]
  -v, --verbose                Verbose logs
  -q, --quiet                  Only errors
  --version                    Show version
  --help                       Show help

Programmatic use

from ocr.pdf2text import pdf_to_text

text = pdf_to_text("/path/file.pdf")

Development

uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q

Releasing is handled via standard tags and GitHub Releases.

License

MIT

Test coverage

# Terminal report
make coverage

# HTML report in htmlcov/
make coverhtml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upspawn_ocr_cli-0.1.0b4.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upspawn_ocr_cli-0.1.0b4-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file upspawn_ocr_cli-0.1.0b4.tar.gz.

File metadata

  • Download URL: upspawn_ocr_cli-0.1.0b4.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for upspawn_ocr_cli-0.1.0b4.tar.gz
Algorithm Hash digest
SHA256 1523557c27256f324146a437bb428e1f225e71ffd68cd2da7159a54ccc841407
MD5 8b3853cf43d1ea6d876b3bca8c0ed2e9
BLAKE2b-256 07cb212b9a8f41da3de5baa3187281e05851c5239f063f53c9b7180cf182e2c0

See more details on using hashes here.

File details

Details for the file upspawn_ocr_cli-0.1.0b4-py3-none-any.whl.

File metadata

  • Download URL: upspawn_ocr_cli-0.1.0b4-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for upspawn_ocr_cli-0.1.0b4-py3-none-any.whl
Algorithm Hash digest
SHA256 e70bae6bc212b885c79684fe0a07192f6e0b104db1fe5086190a40e918e9ddc5
MD5 0f87a58cada0b11a21a212b89919ca6a
BLAKE2b-256 e7075817618e71c8cd0dd39178efb7ac1cbdf0b05de3ec4be4854829fb1bfa4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page