Skip to main content

An extension library for invenio-checks

Project description

OARepo Checks

An extension for invenio-checks that adds LLM-powered validation checks for Invenio records.

Features

This library provides:

  • LLM-powered validation checks - Validate records using configurable Large Language Models
  • Jinja2 templates - Define prompts using Jinja2 templates (see TEMPLATES.md)
  • Service components - Two components for integrating checks into your Invenio application:
    • OARepoChecksComponents - Triggers checks on record creation
    • RegisterCheckComponent - Automatically creates and updates check configurations when communities are created or modified
  • CLI tool - Command-line interface for managing LLM checks across communities

Configuration

1. Define LLM Clients

Configure one or more LLM clients in your Invenio application configuration:

from oarepo_checks.llm_client import ChatEInfraClient

# In your invenio.cfg or app configuration
OAREPO_CHECKS_LLM_CLIENTS = {
    "chat_einfra": ChatEInfraClient(
        api_token="your-api-token",
        api_url="https://llm.ai.e-infra.cz/v1/chat/completions",  # optional, this is default
        model="gpt-oss-120b"  # optional, this is default
    )
}

# Set the default client to use
OAREPO_CHECKS_DEFAULT_LLM_CLIENT = "chat_einfra"

2. Creating Custom LLM Clients

You can create custom clients by inheriting from BaseLLMClient:

from oarepo_checks.llm_client import BaseLLMClient
import requests

class CustomLLMClient(BaseLLMClient):
    def __init__(self, api_key: str, endpoint: str):
        self.api_key = api_key
        self.endpoint = endpoint

    def chat_completion(self, prompt: str, **kwargs) -> str:
        """
        Send a prompt to your LLM API and return JSON response.

        Returns:
            str: A valid JSON string with validation results
        """
        # Your implementation here
        ...

# Register in configuration
OAREPO_CHECKS_LLM_CLIENTS = {
    "custom": CustomLLMClient(
        api_key="your-key",
        endpoint="https://your-llm-api.com/chat"
    )
}

3. Manually Configure the Check

The LLM check uses Jinja2 templates for flexible prompt configuration. You can either use the default templates or create custom ones.

Using Default Templates

from invenio_checks.models import CheckConfig, Severity
from invenio_db import db

check_config_llm = CheckConfig(
    community_id=community.id,  # Community ID where to add check to
    check_id="llm",  # State that we would like to use the LLM check
    severity=Severity.WARN,  # Since LLM make mistakes, we would like to keep them as warnings
    enabled=True,
    params={
        "prompt": "Some very good prompt to check for mistakes",
    },
)
db.session.add(check_config_llm)
db.session.commit()

Using the Prompt Creation Utility

You can also create prompts programmatically:

from oarepo_checks import create_prompt
import json

# Create prompt from templates
prompt = create_prompt(
    record_serialized=json.dumps(dict(record)),
    community=community, # Community record (optional)
    # Optionally override default templates:
    # prompt_template="custom_templates/my_prompt.jinja2",
)

The prompt should instruct the LLM to return structured JSON with errors organized by sections (e.g., metadata, authors, files, license).

This component will trigger validation checks immediately when a new record/draft is created.

Service Components

This library provides two service components to integrate checks into your Invenio application:

1. OARepoChecksComponents

This component triggers LLM checks when records are created and is built on top of Invenio ChecksComponent. Furthermore it returns generic community ID on record without communities which enables to run checks on records/drafts without predefined community.

You need to replace Invenio ChecksComponents with OARepoChecksComponent in RDM_RECORDS_SERVICE_COMPONENTS

2. RegisterCheckComponent

This component automatically creates and updates LLM check configurations when communities are created or modified. It generates community-specific prompts using Jinja2 templates. By default all LLM checks are enabled. You can disable/enable them by using CLI commands (see below). Add it to your communities service:

from invenio_communities.services.components import DefaultCommunityComponents
from oarepo_checks.services.components.register_check_config import RegisterCheckComponent

# In your invenio.cfg or app configuration
app_config["COMMUNITIES_SERVICE_COMPONENTS"] = [
    *DefaultCommunityComponents,
    RegisterCheckComponent
]

When a community is created, this component:

  • Automatically creates a CheckConfig for the LLM check
  • Generates a prompt with community-specific rules using templates
  • Sets the check severity to WARN by default

When a community is updated, it regenerates the prompt to reflect any changes to community metadata.

CLI Commands

The library includes a CLI tool for managing LLM checks across communities:

Enable/Disable LLM checks

# Disable LLM check for a specific community
oarepo checks disable-llm-check <community-slug>

# Enable LLM check for a specific community
oarepo checks enable-llm-check <community-slug>

Update prompts

# Update prompts for all communities (regenerates with latest templates)
oarepo checks update-prompts

# Update prompt for a specific community only
oarepo checks update-prompts --community-slug <community-slug>

This is useful when:

  • You've updated your Jinja2 templates and want to apply changes to existing communities
  • Community metadata has been modified outside the normal update workflow
  • You need to batch-regenerate prompts after configuration changes

Usage

Once configured, the LLM check integrates with invenio-checks. It will:

  1. Serialize the record to JSON
  2. Send it to the configured LLM with your prompt
  3. Parse the LLM response for validation errors
  4. Return structured error messages organized by field/section

The check runs automatically when records are created or updated, based on your invenio-checks configuration.

Expected LLM Response Format

The LLM should return JSON in similar structure:

{
  "metadata.title": {                                                   # path for that specific field
    "section_empty": false,                                             # LLM found some errors
    "errors": [
      {
        "error_short": "Brief error description",                       # provide a short and long description
        "error_long": "Detailed explanation and suggestions for fix",
        "manual_check_needed": false                                    # additional flag that can be used later
      }
    ]
  },
  "metadata.license": {
    "section_empty": true,                                              # if no errors are found by the LLM, then it set section_empty = True to know that LLM still checked this section
    "errors": []
  }
}

Requirements

  • Python >= 3.13
  • invenio-checks >= 2.0.0
  • oarepo >= 14.0.0

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oarepo_checks-2.1.0.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oarepo_checks-2.1.0-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file oarepo_checks-2.1.0.tar.gz.

File metadata

  • Download URL: oarepo_checks-2.1.0.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oarepo_checks-2.1.0.tar.gz
Algorithm Hash digest
SHA256 b333f6196d84ed2cba172513ff5f1e6002ecbb8c56a8387cd2296ebc8ffd3bd5
MD5 f81f37fdf5e3921ba92a7e5d6b3cf93b
BLAKE2b-256 c6c28476f4151c9403e0d2f88cd335410099d40ce4d5176517f3eb696b075f04

See more details on using hashes here.

File details

Details for the file oarepo_checks-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: oarepo_checks-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oarepo_checks-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd3798b89faa71a5234384559c4a3450513f1cd2da7485dc747fb2b4969b0d33
MD5 38beb9cbe08af34f5cb07668e7057f1b
BLAKE2b-256 a9869cb433be2840f476a5fb39e2fc18f69b3c69baae4a076fafee7bfb3c0c19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page