A flexible, stand-alone, web-based platform for text annotation tasks
Project description
Potato: The Portable Annotation Tool
Potato is a free, self-hosted annotation platform for NLP, Agentic, and GenAI research. Annotate text, audio, video, images, documents, agent traces, and more — configured entirely through YAML. No coding required.
Try the live demo on HuggingFace Spaces — no installation needed.
Quick Start
pip install potato-annotation
# List available templates
potato list all
# Get a template and start annotating
potato get sentiment_analysis
potato start sentiment_analysis
Or run from source:
git clone https://github.com/davidjurgens/potato.git
cd potato && pip install -r requirements.txt
python potato/flask_server.py start examples/classification/single-choice/config.yaml -p 8000
Open http://localhost:8000 and start annotating.
What Can You Annotate?
Potato handles the full spectrum of annotation tasks — from traditional NLP labeling to evaluating the latest AI agent systems.
Data Types
| Modality | Capabilities |
|---|---|
| Text | Classification, span labeling, entity linking, coreference, pairwise comparison (docs) |
| Agent Traces | Step-by-step evaluation of LLM agents, tool calls, ReAct chains, and multi-agent systems (docs) |
| Web Agents | Screenshot-based review with SVG click/scroll overlays, or live browsing with automatic trace recording (docs) |
| RAG Pipelines | Retrieval relevance, answer faithfulness, citation accuracy, hallucination detection |
| Audio | Waveform visualization, segment labeling, ELAN-style tiered annotation (docs) |
| Video | Frame-by-frame labeling, temporal segments, playback sync (docs) |
| Images | Bounding boxes, polygons, landmarks, classification (docs) |
| Dialogue | Turn-level annotation, conversation trees, interactive chat evaluation |
| Documents | PDF, Word, Markdown, code, and spreadsheets with coordinate mapping (docs) |
Annotation Schemes
| Scheme | Use Case |
|---|---|
| Radio / Checkbox / Likert | Classification, multi-label, rating scales |
| Span annotation | NER, highlighting, hallucination marking |
| Pairwise comparison | A/B testing, best-worst scaling |
| Per-step ratings | Evaluate individual agent actions or dialogue turns |
| Free text | Open-ended responses with validation |
| Triage | Rapid accept/reject/skip curation (docs) |
| Conditional logic | Adaptive forms that respond to prior answers (docs) |
Agent & LLM Evaluation
Potato provides purpose-built tooling for evaluating AI agents at every level of granularity.
Trace Formats
Import traces from any major agent framework with the built-in converter:
python -m potato.trace_converter --input traces.json --input-format openai --output data.jsonl
Supported formats: OpenAI, Anthropic/Claude, ReAct, LangChain, LangFuse, WebArena, SWE-bench, OpenTelemetry, CrewAI/AutoGen/LangGraph, MCP, and more. Auto-detection is available with --auto-detect.
Evaluation Levels
| Level | What You Annotate | Example |
|---|---|---|
| Trajectory | Overall task success, efficiency, safety | "Did the agent complete the task?" |
| Step | Individual action correctness, reasoning quality | Per-turn Likert ratings on each agent step |
| Span | Specific text segments within agent output | Highlight hallucinated claims, factual errors |
| Comparison | Side-by-side A/B agent evaluation | "Which agent performed better?" |
Web Agent Viewer
An interactive viewer for GUI agent traces — navigate step-by-step through screenshots with SVG overlays showing clicks, bounding boxes, mouse paths, and scroll actions. Annotators rate each step with inline controls while a filmstrip bar provides quick navigation.
Ready-to-Use Agent Examples
| Example | What It Evaluates |
|---|---|
| agent-trace-evaluation | Text agent traces with MAST error taxonomy + hallucination spans |
| visual-agent-evaluation | GUI agents with screenshot grounding accuracy |
| agent-comparison | Side-by-side A/B agent comparison |
| rag-evaluation | RAG retrieval relevance and citation accuracy |
| openai-evaluation | OpenAI Chat API traces with tool calls |
| anthropic-evaluation | Claude messages with tool_use blocks |
| swebench-evaluation | Coding agents with patch correctness ratings |
| multi-agent-evaluation | Multi-agent coordination (CrewAI, AutoGen, LangGraph) |
| web-agent-review | Pre-recorded web traces with step-by-step overlay viewer |
| web-agent-creation | Live web browsing with automatic trace recording |
AI-Powered Annotation
LLM Label Suggestions
Integrate any LLM provider to pre-annotate instances and suggest labels. Annotators review and correct — dramatically faster than labeling from scratch.
Supported backends: OpenAI, Anthropic, Ollama, vLLM, Gemini, HuggingFace, OpenRouter
Active Learning
Potato reorders your annotation queue based on model uncertainty so annotators label the most informative instances first. Supports uncertainty sampling, BADGE, BALD, diversity, and hybrid strategies (docs).
Solo Mode
A human-LLM collaborative workflow where the system learns from annotator feedback and progressively transitions to autonomous LLM labeling as agreement improves (docs).
Chat Assistant
An LLM-powered sidebar where annotators can ask questions about difficult instances. The AI provides guidance informed by your task description and annotation guidelines — helping annotators think through decisions without auto-labeling (docs).
Quality Control & Workflows
Quality Assurance
| Feature | Description |
|---|---|
| Attention checks | Automatically inserted known-answer items to verify engagement |
| Gold standards | Track annotator accuracy against expert labels |
| Inter-annotator agreement | Built-in Krippendorff's alpha and Cohen's kappa |
| Training phase | Practice annotations with feedback before the real task |
| Behavioral tracking | Timing, click patterns, and annotation change history |
Annotation Workflows
| Workflow | Description |
|---|---|
| Multi-annotator | Multiple annotators per item with overlap control and agreement metrics |
| Adjudication | Expert review of annotator disagreements to produce gold labels (docs) |
| Solo mode | Human-LLM collaboration with progressive automation (docs) |
| Crowdsourcing | Prolific and MTurk integration with platform-specific auth (docs) |
| Triage | Rapid accept/reject/skip for data curation (docs) |
Authentication & Deployment
Potato supports multiple authentication methods, from passwordless quick-start to enterprise SSO:
| Method | Use Case |
|---|---|
| In-memory | Local development, quick studies |
| Password + file persistence | Team annotation with shared credential files (docs) |
| Database | Production deployments with SQLite or PostgreSQL (docs) |
| OAuth / SSO | Google, GitHub, or institutional OIDC login (docs) |
| Passwordless | Low-stakes tasks where ease of access matters (docs) |
Passwords are hashed with per-user PBKDF2-SHA256 salts. Admins can reset passwords via CLI (potato reset-password) or REST API. Self-service token-based reset is also available.
Example Projects
Ready-to-use templates organized by type in examples/:
| Category | Examples |
|---|---|
| Classification | Radio, checkbox, Likert, slider, pairwise comparison |
| Span | NER, span linking, coreference, entity linking |
| Agent Traces | LLM agents, web agents, RAG, multi-agent, code agents |
| Audio | Waveform annotation, classification, ELAN-style tiered |
| Video | Frame-level labeling, temporal segments |
| Image | Bounding boxes, PDF/document annotation |
| Advanced | Solo mode, adjudication, quality control, conditional logic |
| AI-Assisted | LLM suggestions, Ollama integration |
| Custom Layouts | Content moderation, dialogue QA, medical review |
Research Showcase
The Potato Showcase contains annotation projects from published research — sentiment analysis, dialogue evaluation, summarization, and more.
potato list all # Browse available projects
potato get <project> # Download one
Documentation
| Topic | Link |
|---|---|
| Quick Start | docs/quick-start.md |
| Configuration Reference | docs/configuration.md |
| Schema Gallery | docs/schemas_and_templates.md |
| Agent Trace Evaluation | docs/agent_traces.md |
| Web Agent Annotation | docs/web_agent_annotation.md |
| AI Support | docs/ai_support.md |
| Active Learning | docs/active_learning_guide.md |
| Solo Mode | docs/solo_mode.md |
| Quality Control | docs/quality_control.md |
| Password Management | docs/password_management.md |
| SSO & OAuth | docs/sso_authentication.md |
| Admin Dashboard | docs/admin_dashboard.md |
| Crowdsourcing | docs/crowdsourcing.md |
| Export Formats | docs/export_formats.md |
| Full Documentation Index | docs/index.md |
Development
# Run tests
pytest tests/ -v
# By category
pytest tests/unit/ -v # Unit tests (fast)
pytest tests/server/ -v # Integration tests
pytest tests/selenium/ -v # Browser tests
# With coverage
pytest --cov=potato --cov-report=html
Support
- Issues: GitHub Issues
- Questions: jurgens@umich.edu
- Docs: potatoannotator.readthedocs.io
License
Potato is licensed under Polyform Shield. Non-commercial applications can use Potato however they want. Commercial applications can use Potato to annotate all they want, but cannot integrate Potato into a commercial product.
License FAQ
| Use Case | Allowed? |
|---|---|
| Academic research | Yes |
| Company annotation | Yes |
| Fork for personal development | Yes |
| Integration in open-source pipelines | Yes |
| Commercial annotation service | Contact us |
| Competing annotation platform | Contact us |
Citation
@inproceedings{pei2022potato,
title={POTATO: The Portable Text Annotation Tool},
author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
year={2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file potato_annotation-2.4.1.tar.gz.
File metadata
- Download URL: potato_annotation-2.4.1.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5a5d4a00adad6d358ec6e1114938e0b5451922c3568ad1527001ac1af327d29
|
|
| MD5 |
6d76f13989bb382efe76cbf28759d802
|
|
| BLAKE2b-256 |
1ce308ce73a40c42126f78e2e83a4e974ba68fdf1bb390410fab5f03dc202a08
|
File details
Details for the file potato_annotation-2.4.1-py3-none-any.whl.
File metadata
- Download URL: potato_annotation-2.4.1-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7a2d19eaaeb93a0c4ecb19d34a5903b3293d915f01e12d2b66a5bdb6e1e02ee
|
|
| MD5 |
ce83f6daeabbdc7fd6e31024eb371697
|
|
| BLAKE2b-256 |
f11b9afe418daaf4541384086b2aa50d7cb0f4e5a6286a4e2107fac5369a2bfb
|