Skip to main content

A simple configuration manager with Pydantic and JSON export.

Project description

OCR & LLM Parser

A powerful Python package for parsing and processing documents using multiple providers:

  • Mistral OCR — Extracts text from PDFs and images with high accuracy.
  • LangChain — Processes or summarizes text using LLMs.
  • Llama Parser — Advanced parsing with Markdown or text output.
  • HuggingFace — OCR and document question answering with transformer models.

The package provides a unified interface so you can switch between providers easily using a factory pattern.


🚀 Features

  • Extract text from PDFs or images
  • Summarize or process text using LLMs
  • Support for Markdown or plain text output
  • Plug-and-play factory to switch providers without changing much code
  • Handles environment variable loading for API keys automatically

🔑 Tokens

Create a .env file in your project root and add the API keys for the services you want to use.

Mistral OCR

MISTRAL-OCR-API-TOKEN=your_mistral_api_key

Llama Parser

LLAMA-PARSER-API-TOKEN=your_llama_parser_api_key

HuggingFace

HF-API-TOKEN=your_huggingface_api_key

Only include the keys for the providers you plan to use.


🛠️ Usage

from HowdenParser import ParserFactory

from pathlib import Path

parser = ParserFactory.get_parser("mistralocr:", result_type="md") text = parser.parse(Path("document.pdf")) print(text)

if HowdenConfig package being used

parser = ParserFactory.get_parser("mistralocr:", **config.parameter.dump_model())

text = parser.parse(Path("document.pdf"))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

howdenparser-0.1.17.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

howdenparser-0.1.17-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file howdenparser-0.1.17.tar.gz.

File metadata

  • Download URL: howdenparser-0.1.17.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Windows/11

File hashes

Hashes for howdenparser-0.1.17.tar.gz
Algorithm Hash digest
SHA256 1765b7d6c7d807732295ddad8ea361ce6841d33f6cbd2b64e58d1c03b7211c38
MD5 cd62127142c914bf8c970607ee9fbb9c
BLAKE2b-256 48201f71e1341a8f8a6f227e8c5d8205250fbd2b5971ad74e6c81d34dce029f5

See more details on using hashes here.

File details

Details for the file howdenparser-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: howdenparser-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Windows/11

File hashes

Hashes for howdenparser-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 6132c56cc68b65cfcc09f6f6f8bc28a0ad88e2e9b27ec0626d6b3cd3a5bdd045
MD5 e118c87d3287f4751a43961ebeb28116
BLAKE2b-256 1a2fb6dd7c60ebdab9cec697daee0d51222c0f5541542cd0b4d6d5496beb1bd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page