Skip to main content

A flexible, stand-alone, web-based platform for text annotation tasks

Project description

Potato: The Portable Text Annotation Tool

Documentation PyPI License Paper

Potato is a lightweight, configuration-driven annotation tool for NLP research. Go from zero to annotating in minutes—no coding required.

Why Potato?

Feature Potato Other Tools
Setup time Minutes (YAML config) Hours/days (custom code)
Coding required None Often extensive
Self-hosted Yes (full data control) Often cloud-only
AI assistance Built-in LLM support Rarely available
Cost Free Often paid

Key Features

Multi-Modal Annotation

Potato supports annotation across multiple data types:

Modality Features
Text Classification, span labeling, pairwise comparison, free-form responses
Audio Waveform visualization, segment labeling, playback controls (docs)
Video Frame-by-frame annotation, temporal segments, playback sync (docs)
Images Region labeling, classification, comparison tasks (docs)
Dialogue Turn-level annotation, conversation threading

Annotation Schemes

  • Classification: Radio buttons, checkboxes, Likert scales
  • Span Annotation: Highlight and label text spans with keyboard shortcuts
  • Pairwise Comparison: Side-by-side comparisons, best-worst scaling
  • Free Text: Text boxes with validation and character limits

AI-Powered Assistance

  • Label Suggestions: LLM-powered pre-annotations to speed up work
  • Active Learning: Prioritize uncertain instances for efficient labeling
  • Multiple Backends: OpenAI, Anthropic, Ollama, vLLM, and more

Quality Control

  • Attention Checks: Automatically inserted validation questions
  • Gold Standards: Track annotator accuracy against known answers
  • Inter-Annotator Agreement: Built-in Krippendorff's alpha calculation
  • Time Tracking: Monitor annotation speed per instance

Productivity

  • Keyboard Shortcuts: Full keyboard navigation and labeling
  • Dynamic Highlighting: Smart keyword highlighting based on labels
  • Tooltips: Hover descriptions for complex label schemes
  • Progress Tracking: Real-time completion statistics

Deployment Options

  • Local Development: Single command startup
  • Team Annotation: Multi-user with authentication
  • Crowdsourcing: Prolific and MTurk integration
  • Enterprise: MySQL backend for large-scale deployments

Quick Start

Option 1: Install from PyPI (Recommended)

pip install potato-annotation

# List available templates
potato list all

# Get a template project
potato get sentiment_analysis

# Start annotating
potato start sentiment_analysis

Option 2: Run from Source

git clone https://github.com/davidjurgens/potato.git
cd potato
pip install -r requirements.txt

# Start a simple annotation task
python potato/flask_server.py start project-hub/simple_examples/configs/simple-check-box.yaml -p 8000

Then open http://localhost:8000 in your browser.


Documentation

Topic Description
Getting Started Installation and first project setup
Configuration Guide YAML configuration options
Annotation Schemas Radio, checkbox, span, likert, and more
Data Formats Input/output data specifications
AI Support LLM integration for label suggestions
Quality Control Attention checks and gold standards
Active Learning ML-based instance prioritization
Admin Dashboard Monitoring and analytics
Crowdsourcing Prolific and MTurk setup
User Simulator Testing and load simulation

Example Projects

Ready-to-use annotation setups in project-hub/:

Project Description Config
Sentiment Analysis Document-level sentiment classification Radio buttons
Dialogue Analysis Span labeling in conversations Span annotation
Summarization Eval Compare and rate summaries Likert + pairwise
Question Answering Extract answer spans Span + checkbox
Simple Examples Minimal configs for each schema type Various

Annotation Guidelines Showcase

Looking for real-world examples? The Potato Showcase contains a curated gallery of annotation guidelines and configurations from published research projects. Browse examples of:

  • Annotation codebooks and instructions
  • Complex multi-schema configurations
  • Quality control setups
  • Custom UI configurations

See all example projects in the documentation.


What's New in v2.0

  • AI Support: Integrated LLM assistance with OpenAI, Anthropic, Gemini, Ollama, vLLM
  • Audio Annotation: Waveform-based segmentation with Peaks.js
  • Video Annotation: Frame-by-frame labeling with playback controls
  • Active Learning: Uncertainty sampling for efficient annotation
  • Training Phase: Practice annotations with feedback
  • Quality Control: Attention checks, gold standards, agreement metrics
  • User Simulator: Automated testing with configurable annotator behaviors
  • Database Backend: MySQL support for large-scale deployments
  • Debug Mode: Skip to specific phases, selective logging

See CHANGELOG.md for full release history.


Architecture

potato/
├── flask_server.py      # Main application server
├── routes.py            # HTTP endpoints
├── templates/           # Jinja2 HTML templates
├── static/              # JavaScript, CSS
├── server_utils/
│   └── schemas/         # Annotation type implementations
├── ai/                  # LLM endpoint integrations
├── simulator/           # User simulation for testing
└── quality_control.py   # QC validation logic

project-hub/             # Example annotation projects
tests/                   # Test suite
docs/                    # Documentation

Development

# Run tests
pytest tests/ -v

# Run specific test categories
pytest tests/unit/ -v        # Unit tests
pytest tests/simulator/ -v   # Simulator tests
pytest tests/server/ -v      # Integration tests

# Run with coverage
pytest --cov=potato --cov-report=html

Support


License

Potato is dual-licensed under Polyform Shield for non-commercial use. Commercial licensing is available—contact jurgens@umich.edu for details.

License FAQ
Use Case Allowed?
Academic research Yes
Internal company annotation Yes
Fork for personal development Yes
Integration in open-source pipelines Yes
Commercial annotation service Contact us
Competing annotation platform Contact us

Citation

@inproceedings{pei2022potato,
  title={POTATO: The Portable Text Annotation Tool},
  author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year={2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

potato_annotation-2.1.0.tar.gz (889.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

potato_annotation-2.1.0-py3-none-any.whl (975.2 kB view details)

Uploaded Python 3

File details

Details for the file potato_annotation-2.1.0.tar.gz.

File metadata

  • Download URL: potato_annotation-2.1.0.tar.gz
  • Upload date:
  • Size: 889.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for potato_annotation-2.1.0.tar.gz
Algorithm Hash digest
SHA256 62c8a31ff45cc32ea8c5e22c442b6260816e601adc3b0c1ee3b61504bb1b634c
MD5 9b2b4087ff3f518304a436d935d7d544
BLAKE2b-256 da6f9cbdf02d601b51cff07851d960ef43c316facedf91bb0b9cf851e60c0103

See more details on using hashes here.

File details

Details for the file potato_annotation-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for potato_annotation-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c092a3709d5455c6edc6a660253ac2638efdb27e63734d43d2630dbc3ba2d69
MD5 4d0503ab99f2c3707d077f663cc139f6
BLAKE2b-256 52ea3ea2b245e164556a6695a3d05017d8096a1d117ef4fa2235a45957db808f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page