Skip to main content

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

Project description

Mistral OCR CLI

CI PyPI Python License pre-commit Code style: black

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

Features

  • Elegant TUI with progress bars and rich output
  • Single file or batch processing
  • Output in text, JSON, or Markdown
  • Parallel batch processing with --jobs
  • Config helper and .env support

Quickstart

  1. Install
uv tool install mistral-ocr-cli  # via pipx-like tool install
# or
uv pip install mistral-ocr-cli   # into current environment
  1. Configure API key
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
  1. Extract text
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4

Usage

ocr extract [OPTIONS] FILES...

Options:
  -o, --output PATH            Output file (single-file mode)
  -f, --format [text|json|markdown]
  -b, --batch                  Enable batch mode
  -O, --output-dir PATH        Directory for batch outputs
  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]
  -v, --verbose                Verbose logs
  -q, --quiet                  Only errors
  --version                    Show version
  --help                       Show help

Programmatic use

from ocr.pdf2text import pdf_to_text

text = pdf_to_text("/path/file.pdf")

Development

uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q

Releasing is handled via standard tags and GitHub Releases.

License

MIT

Test coverage

# Terminal report
make coverage

# HTML report in htmlcov/
make coverhtml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upspawn_ocr_cli-0.1.0b2.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upspawn_ocr_cli-0.1.0b2-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file upspawn_ocr_cli-0.1.0b2.tar.gz.

File metadata

  • Download URL: upspawn_ocr_cli-0.1.0b2.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for upspawn_ocr_cli-0.1.0b2.tar.gz
Algorithm Hash digest
SHA256 6c015b78a21285f73972b637b7131728b1ce0d4faaf5e02593799b9a0a8337e1
MD5 7e47533538f7c249e5b9e3c794fc48c5
BLAKE2b-256 d7cbc75757fe46714b757766f0b53ac30be3572392e5d385d370e796e5679688

See more details on using hashes here.

File details

Details for the file upspawn_ocr_cli-0.1.0b2-py3-none-any.whl.

File metadata

File hashes

Hashes for upspawn_ocr_cli-0.1.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 b538ad786bcf5db8f815c04b8dd24a41343a70cc5685e22ea4db3e4713838e4b
MD5 d22643bfa45a6e058b66e21afc7380d5
BLAKE2b-256 e4cb359d6a101f15711a48a4b2cd77d6c358cd81647fe5878f9f4488882dd56c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page