A flexible, stand-alone, web-based platform for text annotation tasks
Project description
Potato: The Portable Text Annotation Tool
Potato is a lightweight, configuration-driven annotation tool for NLP research. Go from zero to annotating in minutes—no coding required.
Why Potato?
| Feature | Potato | Other Tools |
|---|---|---|
| Setup time | Minutes (YAML config) | Hours/days (custom code) |
| Coding required | None | Often extensive |
| Self-hosted | Yes (full data control) | Often cloud-only |
| AI assistance | Built-in LLM support | Rarely available |
| Cost | Free | Often paid |
Key Features
Multi-Modal Annotation
Potato supports annotation across multiple data types:
| Modality | Features |
|---|---|
| Text | Classification, span labeling, pairwise comparison, free-form responses |
| Audio | Waveform visualization, segment labeling, playback controls (docs) |
| Video | Frame-by-frame annotation, temporal segments, playback sync (docs) |
| Images | Region labeling, classification, comparison tasks (docs) |
| Dialogue | Turn-level annotation, conversation threading |
Annotation Schemes
- Classification: Radio buttons, checkboxes, Likert scales
- Span Annotation: Highlight and label text spans with keyboard shortcuts
- Pairwise Comparison: Side-by-side comparisons, best-worst scaling
- Free Text: Text boxes with validation and character limits
AI-Powered Assistance
- Label Suggestions: LLM-powered pre-annotations to speed up work
- Active Learning: Prioritize uncertain instances for efficient labeling
- Multiple Backends: OpenAI, Anthropic, Ollama, vLLM, and more
Quality Control
- Attention Checks: Automatically inserted validation questions
- Gold Standards: Track annotator accuracy against known answers
- Inter-Annotator Agreement: Built-in Krippendorff's alpha calculation
- Time Tracking: Monitor annotation speed per instance
Productivity
- Keyboard Shortcuts: Full keyboard navigation and labeling
- Dynamic Highlighting: Smart keyword highlighting based on labels
- Tooltips: Hover descriptions for complex label schemes
- Progress Tracking: Real-time completion statistics
Deployment Options
- Local Development: Single command startup
- Team Annotation: Multi-user with authentication
- Crowdsourcing: Prolific and MTurk integration
- Enterprise: MySQL backend for large-scale deployments
Quick Start
Option 1: Install from PyPI (Recommended)
pip install potato-annotation
# List available templates
potato list all
# Get a template project
potato get sentiment_analysis
# Start annotating
potato start sentiment_analysis
Option 2: Run from Source
git clone https://github.com/davidjurgens/potato.git
cd potato
pip install -r requirements.txt
# Start a simple annotation task
python potato/flask_server.py start project-hub/simple_examples/configs/simple-check-box.yaml -p 8000
Then open http://localhost:8000 in your browser.
Documentation
| Topic | Description |
|---|---|
| Getting Started | Installation and first project setup |
| Configuration Guide | YAML configuration options |
| Annotation Schemas | Radio, checkbox, span, likert, and more |
| Data Formats | Input/output data specifications |
| AI Support | LLM integration for label suggestions |
| Quality Control | Attention checks and gold standards |
| Active Learning | ML-based instance prioritization |
| Admin Dashboard | Monitoring and analytics |
| Crowdsourcing | Prolific and MTurk setup |
| User Simulator | Testing and load simulation |
Example Projects
Ready-to-use annotation setups in project-hub/:
| Project | Description | Config |
|---|---|---|
| Sentiment Analysis | Document-level sentiment classification | Radio buttons |
| Dialogue Analysis | Span labeling in conversations | Span annotation |
| Summarization Eval | Compare and rate summaries | Likert + pairwise |
| Question Answering | Extract answer spans | Span + checkbox |
| Simple Examples | Minimal configs for each schema type | Various |
Annotation Guidelines Showcase
Looking for real-world examples? The Potato Showcase contains a curated gallery of annotation guidelines and configurations from published research projects. Browse examples of:
- Annotation codebooks and instructions
- Complex multi-schema configurations
- Quality control setups
- Custom UI configurations
See all example projects in the documentation.
What's New in v2.0
- AI Support: Integrated LLM assistance with OpenAI, Anthropic, Gemini, Ollama, vLLM
- Audio Annotation: Waveform-based segmentation with Peaks.js
- Video Annotation: Frame-by-frame labeling with playback controls
- Active Learning: Uncertainty sampling for efficient annotation
- Training Phase: Practice annotations with feedback
- Quality Control: Attention checks, gold standards, agreement metrics
- User Simulator: Automated testing with configurable annotator behaviors
- Database Backend: MySQL support for large-scale deployments
- Debug Mode: Skip to specific phases, selective logging
See CHANGELOG.md for full release history.
Architecture
potato/
├── flask_server.py # Main application server
├── routes.py # HTTP endpoints
├── templates/ # Jinja2 HTML templates
├── static/ # JavaScript, CSS
├── server_utils/
│ └── schemas/ # Annotation type implementations
├── ai/ # LLM endpoint integrations
├── simulator/ # User simulation for testing
└── quality_control.py # QC validation logic
project-hub/ # Example annotation projects
tests/ # Test suite
docs/ # Documentation
Development
# Run tests
pytest tests/ -v
# Run specific test categories
pytest tests/unit/ -v # Unit tests
pytest tests/simulator/ -v # Simulator tests
pytest tests/server/ -v # Integration tests
# Run with coverage
pytest --cov=potato --cov-report=html
Support
- Issues: GitHub Issues
- Questions: pedropei@umich.edu or jurgens@umich.edu
- Documentation: potatoannotator.readthedocs.io
License
Potato is dual-licensed under Polyform Shield for non-commercial use. Commercial licensing is available—contact jurgens@umich.edu for details.
License FAQ
| Use Case | Allowed? |
|---|---|
| Academic research | Yes |
| Internal company annotation | Yes |
| Fork for personal development | Yes |
| Integration in open-source pipelines | Yes |
| Commercial annotation service | Contact us |
| Competing annotation platform | Contact us |
Citation
@inproceedings{pei2022potato,
title={POTATO: The Portable Text Annotation Tool},
author={Pei, Jiaxin and Ananthasubramaniam, Aparna and Wang, Xingyao and Zhou, Naitian and Dedeloudis, Apostolos and Sargent, Jackson and Jurgens, David},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
year={2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file potato_annotation-2.1.0.tar.gz.
File metadata
- Download URL: potato_annotation-2.1.0.tar.gz
- Upload date:
- Size: 889.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62c8a31ff45cc32ea8c5e22c442b6260816e601adc3b0c1ee3b61504bb1b634c
|
|
| MD5 |
9b2b4087ff3f518304a436d935d7d544
|
|
| BLAKE2b-256 |
da6f9cbdf02d601b51cff07851d960ef43c316facedf91bb0b9cf851e60c0103
|
File details
Details for the file potato_annotation-2.1.0-py3-none-any.whl.
File metadata
- Download URL: potato_annotation-2.1.0-py3-none-any.whl
- Upload date:
- Size: 975.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c092a3709d5455c6edc6a660253ac2638efdb27e63734d43d2630dbc3ba2d69
|
|
| MD5 |
4d0503ab99f2c3707d077f663cc139f6
|
|
| BLAKE2b-256 |
52ea3ea2b245e164556a6695a3d05017d8096a1d117ef4fa2235a45957db808f
|