An extension library for invenio-checks
Project description
OARepo Checks
An extension for invenio-checks that adds LLM-powered validation checks for Invenio records.
Features
This library provides:
- LLM-powered validation checks - Validate records using configurable Large Language Models
- Jinja2 templates - Define prompts using Jinja2 templates (see TEMPLATES.md)
- Service components - Two components for integrating checks into your Invenio application:
OARepoChecksComponents- Triggers checks on record creationRegisterCheckComponent- Automatically creates and updates check configurations when communities are created or modified
- CLI tool - Command-line interface for managing LLM checks across communities
Configuration
1. Define LLM Clients
Configure one or more LLM clients in your Invenio application configuration:
from oarepo_checks.llm_client import ChatEInfraClient
# In your invenio.cfg or app configuration
OAREPO_CHECKS_LLM_CLIENTS = {
"chat_einfra": ChatEInfraClient(
api_token="your-api-token",
api_url="https://llm.ai.e-infra.cz/v1/chat/completions", # optional, this is default
model="gpt-oss-120b" # optional, this is default
)
}
# Set the default client to use
OAREPO_CHECKS_DEFAULT_LLM_CLIENT = "chat_einfra"
2. Creating Custom LLM Clients
You can create custom clients by inheriting from BaseLLMClient:
from oarepo_checks.llm_client import BaseLLMClient
import requests
class CustomLLMClient(BaseLLMClient):
def __init__(self, api_key: str, endpoint: str):
self.api_key = api_key
self.endpoint = endpoint
def chat_completion(self, prompt: str, **kwargs) -> str:
"""
Send a prompt to your LLM API and return JSON response.
Returns:
str: A valid JSON string with validation results
"""
# Your implementation here
...
# Register in configuration
OAREPO_CHECKS_LLM_CLIENTS = {
"custom": CustomLLMClient(
api_key="your-key",
endpoint="https://your-llm-api.com/chat"
)
}
3. Manually Configure the Check
The LLM check uses Jinja2 templates for flexible prompt configuration. You can either use the default templates or create custom ones.
Using Default Templates
from invenio_checks.models import CheckConfig, Severity
from invenio_db import db
check_config_llm = CheckConfig(
community_id=community.id, # Community ID where to add check to
check_id="llm", # State that we would like to use the LLM check
severity=Severity.WARN, # Since LLM make mistakes, we would like to keep them as warnings
enabled=True,
params={
"prompt": "Some very good prompt to check for mistakes",
},
)
db.session.add(check_config_llm)
db.session.commit()
Using the Prompt Creation Utility
You can also create prompts programmatically:
from oarepo_checks import create_prompt
import json
# Create prompt from templates
prompt = create_prompt(
record_serialized=json.dumps(dict(record)),
community=community, # Community record (optional)
# Optionally override default templates:
# prompt_template="custom_templates/my_prompt.jinja2",
)
The prompt should instruct the LLM to return structured JSON with errors organized by sections (e.g., metadata, authors, files, license).
This component will trigger validation checks immediately when a new record/draft is created.
Service Components
This library provides two service components to integrate checks into your Invenio application:
1. OARepoChecksComponents
This component triggers LLM checks when records are created and is built on top of Invenio ChecksComponent. Furthermore it returns generic community ID on record without communities which enables to run checks on records/drafts without predefined community.
You need to replace Invenio ChecksComponents with OARepoChecksComponent in RDM_RECORDS_SERVICE_COMPONENTS
2. RegisterCheckComponent
This component automatically creates and updates LLM check configurations when communities are created or modified. It generates community-specific prompts using Jinja2 templates. By default all LLM checks are enabled. You can disable/enable them by using CLI commands (see below). Add it to your communities service:
from invenio_communities.services.components import DefaultCommunityComponents
from oarepo_checks.services.components.register_check_config import RegisterCheckComponent
# In your invenio.cfg or app configuration
app_config["COMMUNITIES_SERVICE_COMPONENTS"] = [
*DefaultCommunityComponents,
RegisterCheckComponent
]
When a community is created, this component:
- Automatically creates a
CheckConfigfor the LLM check - Generates a prompt with community-specific rules using templates
- Sets the check severity to
WARNby default
When a community is updated, it regenerates the prompt to reflect any changes to community metadata.
CLI Commands
The library includes a CLI tool for managing LLM checks across communities:
Enable/Disable LLM checks
# Disable LLM check for a specific community
oarepo checks disable-llm-check <community-slug>
# Enable LLM check for a specific community
oarepo checks enable-llm-check <community-slug>
Update prompts
# Update prompts for all communities (regenerates with latest templates)
oarepo checks update-prompts
# Update prompt for a specific community only
oarepo checks update-prompts --community-slug <community-slug>
This is useful when:
- You've updated your Jinja2 templates and want to apply changes to existing communities
- Community metadata has been modified outside the normal update workflow
- You need to batch-regenerate prompts after configuration changes
Usage
Once configured, the LLM check integrates with invenio-checks. It will:
- Serialize the record to JSON
- Send it to the configured LLM with your prompt
- Parse the LLM response for validation errors
- Return structured error messages organized by field/section
The check runs automatically when records are created or updated, based on your invenio-checks configuration.
Expected LLM Response Format
The LLM should return JSON in similar structure:
{
"metadata.title": { # path for that specific field
"section_empty": false, # LLM found some errors
"errors": [
{
"error_short": "Brief error description", # provide a short and long description
"error_long": "Detailed explanation and suggestions for fix",
"manual_check_needed": false # additional flag that can be used later
}
]
},
"metadata.license": {
"section_empty": true, # if no errors are found by the LLM, then it set section_empty = True to know that LLM still checked this section
"errors": []
}
}
Requirements
- Python >= 3.13
- invenio-checks >= 2.0.0
- oarepo >= 14.0.0
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oarepo_checks-2.1.0.tar.gz.
File metadata
- Download URL: oarepo_checks-2.1.0.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b333f6196d84ed2cba172513ff5f1e6002ecbb8c56a8387cd2296ebc8ffd3bd5
|
|
| MD5 |
f81f37fdf5e3921ba92a7e5d6b3cf93b
|
|
| BLAKE2b-256 |
c6c28476f4151c9403e0d2f88cd335410099d40ce4d5176517f3eb696b075f04
|
File details
Details for the file oarepo_checks-2.1.0-py3-none-any.whl.
File metadata
- Download URL: oarepo_checks-2.1.0-py3-none-any.whl
- Upload date:
- Size: 35.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd3798b89faa71a5234384559c4a3450513f1cd2da7485dc747fb2b4969b0d33
|
|
| MD5 |
38beb9cbe08af34f5cb07668e7057f1b
|
|
| BLAKE2b-256 |
a9869cb433be2840f476a5fb39e2fc18f69b3c69baae4a076fafee7bfb3c0c19
|