A simple configuration manager with Pydantic and JSON export.
Project description
OCR & LLM Parser
A powerful Python package for parsing and processing documents using multiple providers:
- Mistral OCR — Extracts text from PDFs and images with high accuracy.
- LangChain — Processes or summarizes text using LLMs.
- Llama Parser — Advanced parsing with Markdown or text output.
- HuggingFace — OCR and document question answering with transformer models.
The package provides a unified interface so you can switch between providers easily using a factory pattern.
🚀 Features
- Extract text from PDFs or images
- Summarize or process text using LLMs
- Support for Markdown or plain text output
- Plug-and-play factory to switch providers without changing much code
- Handles environment variable loading for API keys automatically
🔑 Tokens
Create a .env file in your project root and add the API keys for the services you want to use.
Mistral OCR
MISTRAL-OCR-API-TOKEN=your_mistral_api_key
Llama Parser
LLAMA-PARSER-API-TOKEN=your_llama_parser_api_key
HuggingFace
HF-API-TOKEN=your_huggingface_api_key
Only include the keys for the providers you plan to use.
🛠️ Usage
from HowdenParser import ParserFactory
from pathlib import Path
parser = ParserFactory.get_parser("mistralocr:", result_type="md") text = parser.parse(Path("document.pdf")) print(text)
if HowdenConfig package being used
config: Config = Config(parameter=Parameter())
parser = Parser.create(config.parameter)
text = parser.parse(Path("document.pdf"))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file howdenparser-2.0.3.tar.gz.
File metadata
- Download URL: howdenparser-2.0.3.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.11 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
471fd62b396a9f8fd64d8cf87ac60c5c19cc8dd1ced89b2fd92d0350984a73e8
|
|
| MD5 |
46a7d29f6ec194fe88c3003b2ef18470
|
|
| BLAKE2b-256 |
ae85a59b58d6f325c0b6ad403f077730b30f809d8ec2a9c5e58a5f026fa846f4
|
File details
Details for the file howdenparser-2.0.3-py3-none-any.whl.
File metadata
- Download URL: howdenparser-2.0.3-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.11 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3270d75c42db7d0a17520f3ec3f4a89b0fe83867a1d1aeb38a84ddcc30c5be10
|
|
| MD5 |
afd7b926d35a20d6cd58f4626267768d
|
|
| BLAKE2b-256 |
904e57ae6b7ac666657200e0b17d6a322aa28d8cf2de0e54768255a8d274a659
|