Modular, vision-LLM-powered chain to convert image and PDF documents into clean Markdown.

These details have not been verified by PyPI

Project description

langchain_ocr_lib

langchain_ocr_lib is the OCR processing engine behind LangChain-OCR. It provides a modular, vision-LLM-powered Chain to convert image and PDF documents into clean Markdown. Designed for direct CLI usage or integration into larger applications.

Overview
Features
Installation
1. Prerequisites
2. Environment Setup
Usage
1. CLI
2. Python Module
3. Docker
Architecture
Testing
License

1. Overview

This package offers the core functionality to extract text from documents using vision LLMs and convert it into Markdown. It is highly configurable by environment variables and its design based on dependency injection, that allows you to easily swap out components. The package is designed to be used as a library, but it also provides a command-line interface (CLI) for easy local execution.

2. Features

Vision-Language OCR: Supports Ollama, vLLM and OpenAI (and other OpenAI conform providers). Other LLM providers can be easily integrated.
CLI Interface: Simple local execution via command line or container
Highly Configurable: Use environment variables to configure the OCR
Dependency Injection: Easily swap out components for custom implementations
LangChain: Integrates with LangChain
Markdown Output: Outputs well-formatted Markdown text

3. Installation

3.1 Prerequisites

Python: 3.11+
Poetry: Install Poetry
Docker: For containerized CLI usage (optional)
Ollama: Follow instructions here (other LLM providers can be used as well, see here)
Langfuse: Different options for self hosting, see here (optional, for observability)

3.2 Environment Setup

The package is published on PyPI, so you can install it directly with pip:

pip install langchain-ocr-lib

However, if you want to run the latest version or contribute to the project, you can clone the repository and install it locally.

git clone https://github.com/a-klos/langchain-ocr.git
cd langchain-ocr/langchain_ocr_lib
poetry install --with dev

You can configure the package by setting environment variables. Configuration options are shown in the .env.template file.

4. Usage

Remember that you need to pull the configured LLM model first. With Ollama, you can do this with:

ollama pull <model_name>

For example, to pull the gemma3:4b-it-q4_K_M model, run:

ollama pull gemma3:4b-it-q4_K_M

4.1 CLI

Run OCR locally from the terminal:

langchain-ocr <<input_file>>

Supports:

.jpg, .jpeg, .png, and .pdf inputs

4.2 Python Module

Use the the library programmatically:

import inject

import configure_di
from langchain_ocr_lib.di_config import configure_di
from langchain_ocr_lib.di_binding_keys.binding_keys import PdfConverterKey
from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter


configure_di() #This sets up the dependency injection

class Converter:
    _converter: Pdf2MarkdownConverter = inject.attr(PdfConverterKey)
    def convert(self, filename: str) -> str:
        return self._converter.convert2markdown(filename=filename)

converter = Converter()
markdown = converter.convert("../docs/invoice.pdf") # Adjust the file path as needed
print(markdown)

The configure_di() function sets up the dependency injection for the library. The dependencies can be easily swapped out or appended with new dependencies. See ../api/src/langchain_ocr/di_config.py for more details on how to add new dependencies.

Swapping out the dependencies can be done as follows:

import inject
from inject import Binder

from langchain_ocr_lib.di_config import lib_di_config, PdfConverterKey
from langchain_ocr_lib.impl.converter.pdf_converter import Pdf2MarkdownConverter


class MyPdfConverter(Pdf2MarkdownConverter):
    def convert(self, filename: str) -> None:
        markdown = self.convert2markdown(filename=filename)
        print(markdown)

def _api_specific_config(binder: Binder):
    binder.install(lib_di_config)  # Install all default bindings
    binder.bind(PdfConverterKey, MyPdfConverter())  # Then override PdfConverter

def configure():
    """Configure the dependency injection container."""
    inject.configure(_api_specific_config, allow_override=True, clear=True)

configure()

class Converter:
    _converter: MyPdfConverter = inject.attr(PdfConverterKey)
    def convert(self, filename: str) -> None:
        self._converter.convert(filename=filename)

converter = Converter()
converter.convert("../docs/invoice.pdf") # Adjust the file path as needed

4.3 Docker

Run OCR via Docker without local Python setup:

docker build -t ocr -f langchain_ocr_lib/Dockerfile .
docker run --net=host -it --rm -v ./docs:/app/docs:ro ocr docs/invoice.png

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

Apr 21, 2025

0.4.0

Apr 20, 2025

0.3.3

Apr 19, 2025

0.3.2

Apr 13, 2025

This version

0.3.1

Apr 12, 2025

0.3.0

Apr 10, 2025

0.2.0

Apr 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_ocr_lib-0.3.1.tar.gz (15.6 kB view details)

Uploaded Apr 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_ocr_lib-0.3.1-py3-none-any.whl (23.9 kB view details)

Uploaded Apr 12, 2025 Python 3

File details

Details for the file langchain_ocr_lib-0.3.1.tar.gz.

File metadata

Download URL: langchain_ocr_lib-0.3.1.tar.gz
Upload date: Apr 12, 2025
Size: 15.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.5 Linux/6.8.0-57-generic

File hashes

Hashes for langchain_ocr_lib-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`2a0056d72c0be53009c7b16c23020077c8c9c4208ac7b3696088719809e858c2`
MD5	`6aa14fbe794a767262ee1158658e8d01`
BLAKE2b-256	`a1d1ac11eaaf37033d6f25687d26093a341068f3f9455388dfa4b4af80aa75c6`

See more details on using hashes here.

File details

Details for the file langchain_ocr_lib-0.3.1-py3-none-any.whl.

File metadata

Download URL: langchain_ocr_lib-0.3.1-py3-none-any.whl
Upload date: Apr 12, 2025
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.5 Linux/6.8.0-57-generic

File hashes

Hashes for langchain_ocr_lib-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3fc27e126c4a53c4ab8007426f9e237be65cf2b1e3a3203857449eba4ccbedc`
MD5	`2e10b3ee383b089addcdfb561c93fae8`
BLAKE2b-256	`49b9121b1e6693eb80c508462945b2625a5f2accc6aa98167fbe723292093024`

See more details on using hashes here.

langchain-ocr-lib 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

langchain_ocr_lib

Table of Contents

1. Overview

2. Features

3. Installation

3.1 Prerequisites

3.2 Environment Setup

4. Usage

4.1 CLI

4.2 Python Module

4.3 Docker

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes