Skip to main content

A flexible, stand-alone, web-based platform for text annotation tasks

Project description

Potato: The Portable Annotation Tool

Documentation PyPI License Paper Live Demo

Potato is a free, self-hosted annotation platform for NLP, Agentic, and GenAI research. Annotate text, audio, video, images, documents, agent traces, and more — configured entirely through YAML. No coding required.

Try the live demo on HuggingFace Spaces — no installation needed.


Quick Start

pip install potato-annotation

# List available templates
potato list all

# Get a template and start annotating
potato get sentiment_analysis
potato start sentiment_analysis

Or run from source:

git clone https://github.com/davidjurgens/potato.git
cd potato && pip install -r requirements.txt
python potato/flask_server.py start examples/classification/single-choice/config.yaml -p 8000

Open http://localhost:8000 and start annotating.


What Can You Annotate?

Potato handles the full spectrum of annotation tasks — from traditional NLP labeling to evaluating the latest AI agent systems.

Data Types

Modality Capabilities
Text Classification, span labeling, entity linking, coreference, pairwise comparison (docs)
Agent Traces Step-by-step evaluation of LLM agents, tool calls, ReAct chains, and multi-agent systems (docs)
Web Agents Screenshot-based review with SVG click/scroll overlays, or live browsing with automatic trace recording (docs)
RAG Pipelines Retrieval relevance, answer faithfulness, citation accuracy, hallucination detection
Audio Waveform visualization, segment labeling, ELAN-style tiered annotation (docs)
Video Frame-by-frame labeling, temporal segments, playback sync (docs)
Images Bounding boxes, polygons, landmarks, classification (docs)
Dialogue Turn-level annotation, conversation trees, interactive chat evaluation
Documents PDF, Word, Markdown, code, and spreadsheets with coordinate mapping (docs)

Annotation Schemes

Scheme Use Case
Radio / Checkbox / Likert Classification, multi-label, rating scales
Span annotation NER, highlighting, hallucination marking
Pairwise comparison A/B testing, best-worst scaling
Per-step ratings Evaluate individual agent actions or dialogue turns
Free text Open-ended responses with validation
Triage Rapid accept/reject/skip curation (docs)
Conditional logic Adaptive forms that respond to prior answers (docs)

Agent & LLM Evaluation

Potato provides purpose-built tooling for evaluating AI agents at every level of granularity.

Trace Formats

Import traces from any major agent framework with the built-in converter:

python -m potato.trace_converter --input traces.json --input-format openai --output data.jsonl

Supported formats: OpenAI, Anthropic/Claude, ReAct, LangChain, LangFuse, WebArena, SWE-bench, OpenTelemetry, CrewAI/AutoGen/LangGraph, MCP, and more. Auto-detection is available with --auto-detect.

Evaluation Levels

Level What You Annotate Example
Trajectory Overall task success, efficiency, safety "Did the agent complete the task?"
Step Individual action correctness, reasoning quality Per-turn Likert ratings on each agent step
Span Specific text segments within agent output Highlight hallucinated claims, factual errors
Comparison Side-by-side A/B agent evaluation "Which agent performed better?"

Web Agent Viewer

An interactive viewer for GUI agent traces — navigate step-by-step through screenshots with SVG overlays showing clicks, bounding boxes, mouse paths, and scroll actions. Annotators rate each step with inline controls while a filmstrip bar provides quick navigation.

Ready-to-Use Agent Examples

Example What It Evaluates
agent-trace-evaluation Text agent traces with MAST error taxonomy + hallucination spans
visual-agent-evaluation GUI agents with screenshot grounding accuracy
agent-comparison Side-by-side A/B agent comparison
rag-evaluation RAG retrieval relevance and citation accuracy
openai-evaluation OpenAI Chat API traces with tool calls
anthropic-evaluation Claude messages with tool_use blocks
swebench-evaluation Coding agents with patch correctness ratings
multi-agent-evaluation Multi-agent coordination (CrewAI, AutoGen, LangGraph)
web-agent-review Pre-recorded web traces with step-by-step overlay viewer
web-agent-creation Live web browsing with automatic trace recording

AI-Powered Annotation

LLM Label Suggestions

Integrate any LLM provider to pre-annotate instances and suggest labels. Annotators review and correct — dramatically faster than labeling from scratch.

Supported backends: OpenAI, Anthropic, Ollama, vLLM, Gemini, HuggingFace, OpenRouter

Active Learning

Potato reorders your annotation queue based on model uncertainty so annotators label the most informative instances first. Supports uncertainty sampling, BADGE, BALD, diversity, and hybrid strategies (docs).

Solo Mode

A human-LLM collaborative workflow where the system learns from annotator feedback and progressively transitions to autonomous LLM labeling as agreement improves (docs).

Chat Assistant

An LLM-powered sidebar where annotators can ask questions about difficult instances. The AI provides guidance informed by your task description and annotation guidelines — helping annotators think through decisions without auto-labeling (docs).


Quality Control & Workflows

Quality Assurance

Feature Description
Attention checks Automatically inserted known-answer items to verify engagement
Gold standards Track annotator accuracy against expert labels
Inter-annotator agreement Built-in Krippendorff's alpha and Cohen's kappa
Training phase Practice annotations with feedback before the real task
Behavioral tracking Timing, click patterns, and annotation change history

Annotation Workflows

Workflow Description
Multi-annotator Multiple annotators per item with overlap control and agreement metrics
Adjudication Expert review of annotator disagreements to produce gold labels (docs)
Solo mode Human-LLM collaboration with progressive automation (docs)
Crowdsourcing Prolific and MTurk integration with platform-specific auth (docs)
Triage Rapid accept/reject/skip for data curation (docs)

Authentication & Deployment

Potato supports multiple authentication methods, from passwordless quick-start to enterprise SSO:

Method Use Case
In-memory Local development, quick studies
Password + file persistence Team annotation with shared credential files (docs)
Database Production deployments with SQLite or PostgreSQL (docs)
OAuth / SSO Google, GitHub, or institutional OIDC login (docs)
Passwordless Low-stakes tasks where ease of access matters (docs)

Passwords are hashed with per-user PBKDF2-SHA256 salts. Admins can reset passwords via CLI (potato reset-password) or REST API. Self-service token-based reset is also available.


Example Projects

Ready-to-use templates organized by type in examples/:

Category Examples
Classification Radio, checkbox, Likert, slider, pairwise comparison
Span NER, span linking, coreference, entity linking
Agent Traces LLM agents, web agents, RAG, multi-agent, code agents
Audio Waveform annotation, classification, ELAN-style tiered
Video Frame-level labeling, temporal segments
Image Bounding boxes, PDF/document annotation
Advanced Solo mode, adjudication, quality control, conditional logic
AI-Assisted LLM suggestions, Ollama integration
Custom Layouts Content moderation, dialogue QA, medical review

Research Showcase

The Potato Showcase contains annotation projects from published research — sentiment analysis, dialogue evaluation, summarization, and more.

potato list all          # Browse available projects
potato get <project>     # Download one

Documentation

Topic Link
Quick Start docs/quick-start.md
Configuration Reference docs/configuration.md
Schema Gallery docs/schemas_and_templates.md
Agent Trace Evaluation docs/agent_traces.md
Web Agent Annotation docs/web_agent_annotation.md
AI Support docs/ai_support.md
Active Learning docs/active_learning_guide.md
Solo Mode docs/solo_mode.md
Quality Control docs/quality_control.md
Password Management docs/password_management.md
SSO & OAuth docs/sso_authentication.md
Admin Dashboard docs/admin_dashboard.md
Crowdsourcing docs/crowdsourcing.md
Export Formats docs/export_formats.md
Full Documentation Index docs/index.md

Development

# Run tests
pytest tests/ -v

# By category
pytest tests/unit/ -v        # Unit tests (fast)
pytest tests/server/ -v      # Integration tests
pytest tests/selenium/ -v    # Browser tests

# With coverage
pytest --cov=potato --cov-report=html

Support


License

Potato is licensed under Polyform Shield. Non-commercial applications can use Potato however they want. Commercial applications can use Potato to annotate all they want, but cannot integrate Potato into a commercial product.

License FAQ
Use Case Allowed?
Academic research Yes
Company annotation Yes
Fork for personal development Yes
Integration in open-source pipelines Yes
Commercial annotation service Contact us
Competing annotation platform Contact us

Citation

@inproceedings{pei2022potato,
  title={POTATO: The Portable Text Annotation Tool},
  author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year={2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

potato_annotation-2.4.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

potato_annotation-2.4.1-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file potato_annotation-2.4.1.tar.gz.

File metadata

  • Download URL: potato_annotation-2.4.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for potato_annotation-2.4.1.tar.gz
Algorithm Hash digest
SHA256 b5a5d4a00adad6d358ec6e1114938e0b5451922c3568ad1527001ac1af327d29
MD5 6d76f13989bb382efe76cbf28759d802
BLAKE2b-256 1ce308ce73a40c42126f78e2e83a4e974ba68fdf1bb390410fab5f03dc202a08

See more details on using hashes here.

File details

Details for the file potato_annotation-2.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for potato_annotation-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7a2d19eaaeb93a0c4ecb19d34a5903b3293d915f01e12d2b66a5bdb6e1e02ee
MD5 ce83f6daeabbdc7fd6e31024eb371697
BLAKE2b-256 f11b9afe418daaf4541384086b2aa50d7cb0f4e5a6286a4e2107fac5369a2bfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page