Skip to main content

An easy-to-extend LLM annotator for robust, resumable data annotation.

Project description

A simple, extensible LLM-based dataset generator and annotator

CI codecov PyPI version Python versions License GitHub tag

This repository provides a small, resumable framework for annotating datasets with LLMs (via vllm).

Documentation

📚 Read the full documentation for detailed guides, API reference, and examples.

Installation

Recommended:

uv add llm-annotator

or

pip install llm-annotator

Installing flash-infer for your version (eg CUDA12.8)

uv pip install flashinfer-python flashinfer-cubin
# JIT cache package (replace cu129 with your CUDA version: cu128, cu129, or cu130)
uv pip install flashinfer-jit-cache --index-url https://flashinfer.ai/whl/cu128

Usage

Quick example:

from llm_annotator import Annotator

# Annotate a dataset with sentiment classification
with Annotator(model="meta-llama/Llama-3.2-3B-Instruct", max_model_len=4096) as anno:
    ds = anno.annotate_dataset(
        output_dir="outputs/sentiment",
        full_prompt_template="Classify the sentiment: {text}",
        dataset_name="stanfordnlp/imdb",
        dataset_split="test",
        max_num_samples=100,
    )

See the documentation for more examples, including:

  • Structured output with JSON schemas
  • Custom validation and postprocessing
  • Large-scale streaming annotation
  • Generating datasets from scratch
  • Multi-GPU support

Or check out the examples/ directory for complete working examples.

Testing

make test

make test runs the fast suite and skips tests marked as slow.

Additional test targets:

# Fast tests (same as `make test`)
make test-fast

# Slow tests only
make test-slow

# Integration tests only
make test-integration

# Entire suite (fast + slow)
make test-all

You can also run markers directly with pytest:

uv run pytest -m "not slow"
uv run pytest -m "slow"
uv run pytest -m "integration"

Slow and integration tests may load local models, require more runtime, or depend on optional components.

Building documentation

Build the documentation locally:

make docs

Serve the documentation locally (at http://localhost:8000):

make docs-serve

The documentation is automatically built and deployed to GitHub Pages when changes are pushed to the main branch. The pre-commit hook will check that documentation builds successfully before allowing a push if docstrings or documentation files have changed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_annotator-0.7.0.tar.gz (282.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_annotator-0.7.0-py3-none-any.whl (44.6 kB view details)

Uploaded Python 3

File details

Details for the file llm_annotator-0.7.0.tar.gz.

File metadata

  • Download URL: llm_annotator-0.7.0.tar.gz
  • Upload date:
  • Size: 282.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_annotator-0.7.0.tar.gz
Algorithm Hash digest
SHA256 67c4ce17f60671c5ea7e2764dffc03f5e9db70aa08585426c55a75a72871a34e
MD5 449a7a4ba20992f587d0151d6dfdd353
BLAKE2b-256 88199a9205643c851ea6a2e9de62c551a93bab43e9d045a3f8752b47fefb698e

See more details on using hashes here.

File details

Details for the file llm_annotator-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: llm_annotator-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 44.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llm_annotator-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 371c350f0714fc9558bc8d824d1660ab8480a7246c002ca88603535e3935f66b
MD5 1735c51c8a6669df4445202f950b7712
BLAKE2b-256 85896d58b04980c731a6a270c212d6123ef6efac77443c2f6eae70fa7bd4815a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page