Add your description here

These details have not been verified by PyPI

Project links

Reason this release was yanked:

Broken build version

Project description

Tests

LLMAIx (v2) Library

The llmaix library contains the core functionality of the LLMAIx framework.

[!CAUTION] The interface of the library is still in development and may change in the future. The library is not yet ready for production use.

Features

Preprocessing: The library provides tools for extracting text from various file formats, including PDF, DOCX, and TXT. It can apply OCR to images and PDFs, using tesseract, surya-ocr and VLMs via docling.
Information Extraction: The library provides a wrapper helping you to get a JSON response from an LLM. All OpenAI-API compatible models are supported!

Installation

pip install llmaix

To install dependencies for docling:

pip install llmaix[docling]

Available Dependency groups: surya,docling

To install all dependencies:

pip install llmaix[all]

Usage

CLI

llmaix --help

Python

Preprocessing a PDF file without OCR:

from llmaix import preprocess_file

filename = "tests/testfiles/987462_text.pdf"

extracted_text = preprocess_file(filename)

Preprocessing a PDF file with OCR:

from llmaix import preprocess_file

filename = "tests/testfiles/987462_notext.pdf"

extracted_text = preprocess_file(filename, use_ocr=True, ocr_backend="ocrmypdf")

OCR Backends	Comment
ocrmypdf	Uses tesseract. Needs to be installed on the system first!
surya-ocr	Uses surya-ocr. Runs models via transformers library locally.
doclingvlm	Uses docling to perform OCR using a VLM. Configure the model like for information extraction!

PDF Backends	Comment
pymupdf4llm	Uses pymupdf to extract text as markdown from PDF files.
markitdown	Uses markitdown to extract text as markdown from PDF files.
docling	Uses docling to extract text as markdown from PDF files. Caution: docling itself might apply OCR even if you don't specify it.
ocr_backend	Directly use the text output from the OCR backend. Incompatible with ocrmypdf.

Extracting information from a text:

Provide a .env file with your OpenAI API key:

echo "OPENAI_API_KEY=your_openai_api_key" > .env

(Optional) To use a custom base url, set the OPENAI_API_BASE environment variable:

echo "OPENAI_API_BASE=https://your_custom_base_url/v1" >> .env

(Optional) Configure model in the .env file:

echo "OPENAI_MODEL=gpt-4o-2024-08-06" >> .env

Use the extract_info function to extract information from a text. In this example, a pydantic model is used to define the expected output format. The output will be a JSON object.

from llmaix import extract_info
from pydantic import BaseModel

extracted_text = "The KatherLab is a research group at the University of Technology Dresden, lead by Prof. Jakob N. Kather."

class LabInformation(BaseModel):
    name: str
    location: str
    lead: str

extracted_info = extract_info(
    prompt=f"Extract the name, location and lead of the lab from the following text: {extracted_text}",
    llm_model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    pydantic_model=LabInformation,
)

Clone the repository and install the dependencies:

git clone https://github.com/KatherLab/LLMAIx-v2.git
cd LLMAIx-v2
uv sync

Tests

Run the tests using the following command:

uv run pytest

Example to just run test for preprocessing with the ocrmypdf backend:

uv run pytest tests/test_preprocess.py --ocr-backend ocrmypdf

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.26

Aug 28, 2025

0.0.24

Aug 5, 2025

0.0.23

Aug 1, 2025

0.0.22

Jul 31, 2025

0.0.21

Jul 29, 2025

0.0.20

Jul 28, 2025

0.0.19

Jul 28, 2025

0.0.18

Jul 28, 2025

0.0.17

Jul 28, 2025

0.0.16

Jul 28, 2025

0.0.14

Jul 28, 2025

0.0.12

Jul 10, 2025

0.0.11

Jun 30, 2025

This version

0.0.10 yanked

Jun 30, 2025

Reason this release was yanked:

Broken build version

0.0.9

Jun 18, 2025

0.0.8

Jun 16, 2025

0.0.7

Jun 6, 2025

0.0.6

Jun 2, 2025

0.0.5

May 12, 2025

0.0.3

May 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmaix-0.0.10.tar.gz (1.6 MB view details)

Uploaded Jun 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmaix-0.0.10-py3-none-any.whl (16.6 kB view details)

Uploaded Jun 30, 2025 Python 3

File details

Details for the file llmaix-0.0.10.tar.gz.

File metadata

Download URL: llmaix-0.0.10.tar.gz
Upload date: Jun 30, 2025
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llmaix-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`d117f4728df66b2b3017bf9a3843bdc6a925e43116123f19f8568469e5c50ce0`
MD5	`27d017dd8c5dbb7eadf1677f5c260655`
BLAKE2b-256	`a1b3df33c55d13a36e83b5be8a0b4006d8682d18a313321489729a7d2704b195`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmaix-0.0.10.tar.gz:

Publisher: python-publish.yml on KatherLab/llmaixlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmaix-0.0.10.tar.gz
- Subject digest: d117f4728df66b2b3017bf9a3843bdc6a925e43116123f19f8568469e5c50ce0
- Sigstore transparency entry: 255926944
- Sigstore integration time: Jun 30, 2025
Source repository:
- Permalink: KatherLab/llmaixlib@434bb25c3a181702b5db128c4013c3268aadc0f2
- Branch / Tag: refs/tags/v0.0.10
- Owner: https://github.com/KatherLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@434bb25c3a181702b5db128c4013c3268aadc0f2
- Trigger Event: release

File details

Details for the file llmaix-0.0.10-py3-none-any.whl.

File metadata

Download URL: llmaix-0.0.10-py3-none-any.whl
Upload date: Jun 30, 2025
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llmaix-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1b956d28c6d7bb3c361475be02c0dc821d8a73ce2447ea802342210cd7b64c59`
MD5	`7d9dffe8721c01e10d1eec9841d6cc32`
BLAKE2b-256	`0631f83da71014779a58e08780b6c3e6d9e06282c4d3ca9a445abd8bd540d585`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmaix-0.0.10-py3-none-any.whl:

Publisher: python-publish.yml on KatherLab/llmaixlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmaix-0.0.10-py3-none-any.whl
- Subject digest: 1b956d28c6d7bb3c361475be02c0dc821d8a73ce2447ea802342210cd7b64c59
- Sigstore transparency entry: 255926953
- Sigstore integration time: Jun 30, 2025
Source repository:
- Permalink: KatherLab/llmaixlib@434bb25c3a181702b5db128c4013c3268aadc0f2
- Branch / Tag: refs/tags/v0.0.10
- Owner: https://github.com/KatherLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@434bb25c3a181702b5db128c4013c3268aadc0f2
- Trigger Event: release

llmaix 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMAIx (v2) Library

Features

Installation

Usage

CLI

Python

Tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance