An easy-to-extend LLM annotator for robust, resumable data annotation.
Project description
A simple, extensible LLM-based dataset generator and annotator
This repository provides a small, resumable framework for annotating datasets with LLMs (via vllm).
Documentation
📚 Read the full documentation for detailed guides, API reference, and examples.
Installation
Recommended:
uv add llm-annotator
or
pip install llm-annotator
Installing flash-infer for your version (eg CUDA12.8)
uv pip install flashinfer-python flashinfer-cubin
# JIT cache package (replace cu129 with your CUDA version: cu128, cu129, or cu130)
uv pip install flashinfer-jit-cache --index-url https://flashinfer.ai/whl/cu128
Usage
Quick example:
from llm_annotator import Annotator
# Annotate a dataset with sentiment classification
with Annotator(model="meta-llama/Llama-3.2-3B-Instruct", max_model_len=4096) as anno:
ds = anno.annotate_dataset(
output_dir="outputs/sentiment",
full_prompt_template="Classify the sentiment: {text}",
dataset_name="stanfordnlp/imdb",
dataset_split="test",
max_num_samples=100,
)
See the documentation for more examples, including:
- Structured output with JSON schemas
- Custom validation and postprocessing
- Large-scale streaming annotation
- Generating datasets from scratch
- Multi-GPU support
Or check out the examples/ directory for complete working examples.
Testing
make test
make test runs the fast suite and skips tests marked as slow.
Additional test targets:
# Fast tests (same as `make test`)
make test-fast
# Slow tests only
make test-slow
# Integration tests only
make test-integration
# Entire suite (fast + slow)
make test-all
You can also run markers directly with pytest:
uv run pytest -m "not slow"
uv run pytest -m "slow"
uv run pytest -m "integration"
Slow and integration tests may load local models, require more runtime, or depend on optional components.
Building documentation
Build the documentation locally:
make docs
Serve the documentation locally (at http://localhost:8000):
make docs-serve
The documentation is automatically built and deployed to GitHub Pages when changes are pushed to the main branch. The pre-commit hook will check that documentation builds successfully before allowing a push if docstrings or documentation files have changed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_annotator-0.7.0.tar.gz.
File metadata
- Download URL: llm_annotator-0.7.0.tar.gz
- Upload date:
- Size: 282.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67c4ce17f60671c5ea7e2764dffc03f5e9db70aa08585426c55a75a72871a34e
|
|
| MD5 |
449a7a4ba20992f587d0151d6dfdd353
|
|
| BLAKE2b-256 |
88199a9205643c851ea6a2e9de62c551a93bab43e9d045a3f8752b47fefb698e
|
File details
Details for the file llm_annotator-0.7.0-py3-none-any.whl.
File metadata
- Download URL: llm_annotator-0.7.0-py3-none-any.whl
- Upload date:
- Size: 44.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
371c350f0714fc9558bc8d824d1660ab8480a7246c002ca88603535e3935f66b
|
|
| MD5 |
1735c51c8a6669df4445202f950b7712
|
|
| BLAKE2b-256 |
85896d58b04980c731a6a270c212d6123ef6efac77443c2f6eae70fa7bd4815a
|