Skip to main content

Add your description here

Project description

Tests

LLMAIx (v2) Library

The llmaix library contains the core functionality of the LLMAIx framework.

[!CAUTION] The interface of the library is still in development and may change in the future. The library is not yet ready for production use.

Features

  • Preprocessing: The library provides tools for extracting text from various file formats, including PDF, DOCX, and TXT. It can apply OCR to images and PDFs, using tesseract, surya-ocr and others.

  • Information Extraction: The library provides a wrapper helping you to get a JSON response from an LLM. All OpenAI-API compatible models are supported!

Installation

pip install llmaixlib

Usage

CLI

llmaix --help

Python

Preprocessing a PDF file without OCR:

from llmaix import preprocess_file

filename = "tests/testfiles/987462_text.pdf"

extracted_text = preprocess_file(filename)

Preprocessing a PDF file with OCR:

from llmaix import preprocess_file

filename = "tests/testfiles/987462_notext.pdf"

extracted_text = preprocess_file(filename, use_ocr=True)

Extracting information from a text:

  1. Provide a .env file with your OpenAI API key:
echo "OPENAI_API_KEY=your_openai_api_key" > .env
  1. To use a custom base url, set the OPENAI_API_BASE environment variable:
echo "OPENAI_API_BASE=https://your_custom_base_url/v1" >> .env
  1. Use the extract_info function to extract information from a text. In this example, a pydantic model is used to define the expected output format. The output will be a JSON object.
from llmaix import extract_info
from pydantic import BaseModel

extracted_text = "The KatherLab is a research group at the University of Technology Dresden, lead by Prof. Jakob N. Kather."

class LabInformation(BaseModel):
    name: str
    location: str
    lead: str

extracted_info = extract_info(
    prompt=f"Extract the name, location and lead of the lab from the following text: {extracted_text}",
    llm_model="Llama-4-Maverick-17B-128E-Instruct-FP8",
    pydantic_model=LabInformation,
)

Clone the repository and install the dependencies:

git clone https://github.com/KatherLab/LLMAIx-v2.git
cd LLMAIx-v2
uv sync

Tests

Run the tests using the following command:

uv run pytest

Example to just run test for preprocessing with the ocrmypdf backend:

uv run pytest tests/test_preprocess.py --ocr-backend ocrmypdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmaix-0.0.3.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmaix-0.0.3-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file llmaix-0.0.3.tar.gz.

File metadata

  • Download URL: llmaix-0.0.3.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llmaix-0.0.3.tar.gz
Algorithm Hash digest
SHA256 78c40ef64abe5d83e9657ccd9715081be1716ff7195810fc53a6447911b48019
MD5 8679f4547bf417a11ddff0349bab6e4a
BLAKE2b-256 d4a1647735df5533ecc75e1390fd6baad04f8900175a136255f1a2b948469877

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmaix-0.0.3.tar.gz:

Publisher: python-publish.yml on KatherLab/llmaixlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmaix-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: llmaix-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llmaix-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 956d9905d24d89d266a5efd3215b1c148aec1077616eff37f27e3ed31e1c2845
MD5 9e7a6cb4957cd9d3e21f6e0e71c8f188
BLAKE2b-256 d2944b1b2576b161be2d9bcad4ea8a110431a67560b0e0b4218ce0105fc6d0b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmaix-0.0.3-py3-none-any.whl:

Publisher: python-publish.yml on KatherLab/llmaixlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page