An easy-to-extend LLM annotator for robust, resumable data annotation.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

BramVanroy

These details have not been verified by PyPI

Project description

Robust, resumable LLM dataset annotation

PyPI version

llm-annotator is a Python 3.12+ library for robust, resumable LLM-driven dataset annotation and generation.

It supports multiple providers through pluggable clients:

vLLM offline inference: VLLMOfflineClient
vLLM server API: VLLMClient
OpenAI API: OpenAIClient
Anthropic API: ClaudeClient

Key capabilities:

Resumable processing with JSONL checkpoints.
Annotation of existing datasets and generation from scratch.
Structured outputs via JSON schema.
Retry and validation hooks for robust pipelines.
Optional Hugging Face Hub upload cadence.
Context-manager cleanup of client resources.

It is not intended for parallel, multi-node, multi-instance generation. If that is what you are after, maybe datatrove is something for you.

Documentation

Read the full documentation at bramvanroy.github.io/llm-annotator.

Provider setup reference: docs/provider-info.md

Installation

Recommended:

uv add llm-annotator

pip install llm-annotator

Install provider extras as needed:

uv add "llm-annotator[vllm]"
uv add "llm-annotator[openai]"
uv add "llm-annotator[anthropic]"

See docs/provider-info.md for auth environment variables and provider-specific setup notes.

For local vLLM runs, install flashinfer for your CUDA version.

uv pip install flashinfer-python flashinfer-cubin
# JIT cache package (replace cu128 with your CUDA variant)
uv pip install flashinfer-jit-cache --index-url https://flashinfer.ai/whl/cu128

Usage

Annotate an existing dataset:

from llm_annotator import Annotator, VLLMOfflineClient

# Use a local vLLM model
client = VLLMOfflineClient(
    model="meta-llama/Llama-3.2-3B-Instruct",
    max_model_len=4096,
)

with Annotator(client=client, verbose=True) as anno:
    ds = anno.annotate_dataset(
        output_dir="outputs/sentiment",
        prompt_template="Classify the sentiment of this text: {text}",
        dataset_name="stanfordnlp/imdb",
        dataset_split="test",
        max_num_samples=100,
    )

Generate a dataset from scratch:

from llm_annotator import Annotator, OpenAIClient

client = OpenAIClient(model="gpt-4o-mini")

with Annotator(client=client) as anno:
    ds = anno.generate_dataset(
        output_dir="outputs/generated-qa",
        prompts="Write a short geography quiz question with answer.",
        max_num_samples=200,
    )

See the documentation for more examples, including:

Structured output with JSON schemas
Custom validation and post-processing
Large-scale streaming annotation
Generating datasets from scratch
Multi-GPU support

Or check out the examples/ directory for complete working examples.

Testing

Install development dependencies first:

uv sync --dev

Run the default checks:

make style
make quality
make test
make typecheck

Pytest marker targets:

# Fast tests (same as `make test`)
make test-fast

# Slow tests only
make test-slow

# Integration tests only
make test-integration

# Entire suite (fast + slow)
make test-all

You can also run markers directly with pytest:

uv run pytest -m "not slow"
uv run pytest -m "slow"
uv run pytest -m "integration"

Slow and integration tests may load local models, require more runtime, or depend on optional components.

Building documentation

Local versioned docs preview (uses mike on a temporary local branch):

make serve-docs

Override version metadata when needed:

make serve-docs DOCS_VERSION=0.4.0 DOCS_ALIAS=latest DOCS_SOURCE_REF=v0.4.0

Docs are published with mike on release tags through .github/workflows/docs.yml.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

BramVanroy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.10.7

Jun 2, 2026

0.10.6

Jun 2, 2026

0.10.5

Jun 2, 2026

0.10.4

Jun 2, 2026

0.10.3

Jun 2, 2026

0.10.2

Jun 2, 2026

0.10.1

Jun 2, 2026

0.10.0

May 29, 2026

0.9.2

May 26, 2026

This version

0.9.1

May 26, 2026

0.9.0

May 25, 2026

0.8.1

May 25, 2026

0.8.0

May 23, 2026

0.7.2

May 22, 2026

0.7.0

May 22, 2026

0.6.0

Dec 2, 2025

0.4.0

Nov 5, 2025

0.3.4

Nov 5, 2025

0.3.3

Nov 2, 2025

0.3.2

Oct 30, 2025

0.3.1

Oct 22, 2025

0.3.0.post1

Oct 22, 2025

0.3.0

Oct 22, 2025

0.2.9

Oct 22, 2025

0.2.8

Oct 8, 2025

0.2.7

Oct 7, 2025

0.2.6

Oct 4, 2025

0.2.5

Oct 4, 2025

0.2.4

Oct 4, 2025

0.2.3

Oct 4, 2025

0.2.2

Oct 4, 2025

0.2.1

Oct 3, 2025

0.2.0

Oct 3, 2025

0.1.1

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_annotator-0.9.1.tar.gz (324.5 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_annotator-0.9.1-py3-none-any.whl (80.5 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file llm_annotator-0.9.1.tar.gz.

File metadata

Download URL: llm_annotator-0.9.1.tar.gz
Upload date: May 26, 2026
Size: 324.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_annotator-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`b0dab7eebbed34e313c9add65afc53bdd5d8429559ed72204ee5e4fdb9c2d134`
MD5	`2078e673351fbb434fa96e2a9dbe9c23`
BLAKE2b-256	`68d6b2c0ed50d1e6a7a5d313b552d069b0bea73870a4f57bbf4b5c6ff215a653`

See more details on using hashes here.

File details

Details for the file llm_annotator-0.9.1-py3-none-any.whl.

File metadata

Download URL: llm_annotator-0.9.1-py3-none-any.whl
Upload date: May 26, 2026
Size: 80.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_annotator-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c4bec5f66747fa41150083a5a04c038466cfbbd4cace08303ed7e4af14acec4`
MD5	`3408987f9e590aed86cc75f08be7d35d`
BLAKE2b-256	`5d42e6eb9d423f4cd230e4c0d6b6425d0993c6aa9c8e92e71cbe4c321c0570a5`

See more details on using hashes here.

llm-annotator 0.9.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Robust, resumable LLM dataset annotation

Documentation

Installation

Usage

Testing

Building documentation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes