No project description provided
Project description
ChatTag
A generic filter that uses ChatGPT Vision API for image annotation and analysis across diverse datasets and domains.
Features
- Multi-domain Support: Supports any domain requiring image classification and annotation (food, pets, medical, industrial, etc.)
- Configurable Prompts: Customizable prompts for different annotation tasks
- Standardized Output: Versioned JSON contract for frames and
labels.jsonl(docs/output_contract.md) - Image Optimization: Automatic image resizing to reduce API costs
- Fault Tolerant: Logs and skips malformed data instead of crashing
- Real-time Processing: Processes video streams in real-time
- Web Visualization: Includes web interface for viewing results
- Pipeline Integration: Works with OpenFilter pipeline architecture
- Environment Configuration: Full configuration through environment variables
- Frame Persistence: Optional saving of JSON results per frame
- Topic Filtering: Process specific topics or exclude unwanted ones
- Topic Forwarding: Preserve main topic alongside processed results for pipeline compatibility
- Cost Optimization: Configurable image size and quality settings
Architecture
The filter follows the OpenFilter pattern with three main stages:
Stage Responsibilities
| Stage | Responsibility |
|---|---|
setup() |
Parse and validate configuration; initialize ChatGPT client; load prompt file |
process() |
Core operation: send images to ChatGPT Vision API, parse, validate, attach result |
shutdown() |
Clean up resources (close connections) when filter stops |
Data Signature
The filter returns processed frames with the following data structure:
Main Frame Data:
- Original frame data preserved
- Processing results under
meta.chatgpt_annotator(see docs/output_contract.md):schema_version: Contract version string (e.g."1.0")annotations: Dict with item_name -> {"present": bool, "confidence": float}usage: Dict with token usage informationprocessing_time,timestamp,model,frame_iderror: Present when processing failed
Topic Forwarding:
The forward_main parameter controls whether the main topic from input frames is forwarded to the output:
forward_main=True: The main topic from input frames is preserved and forwarded to the output alongside processed resultsforward_main=False: Only processed frames are returned (no main topic forwarding)
This is useful in pipeline scenarios where you want to preserve the original main frame alongside processed results for downstream filters.
Installation
# Install with development dependencies
make install
Configuration
- Copy the example environment file:
cp env.example .env
- Edit
.envfile with your configuration:
# Required: OpenAI API Key
FILTER_CHATGPT_API_KEY=your_openai_api_key_here
# Required: Path to prompt file
FILTER_PROMPT=./prompts/annotation_prompt.txt
# Optional: ChatGPT model (default: gpt-4o-mini)
FILTER_CHATGPT_MODEL=gpt-4o-mini
# Optional: API parameters
FILTER_MAX_TOKENS=1000
FILTER_TEMPERATURE=0.1
# Optional: Image processing
FILTER_MAX_IMAGE_SIZE=512
FILTER_IMAGE_QUALITY=85
# Optional: Output configuration
FILTER_SAVE_FRAMES=false
FILTER_OUTPUT_DIR=./output_frames
# Optional: Output schema (JSON string)
FILTER_OUTPUT_SCHEMA={"item1": {"present": false, "confidence": 0.0}, "item2": {"present": false, "confidence": 0.0}}
# Optional: Topic filtering
FILTER_TOPIC_PATTERN=.*
FILTER_EXCLUDE_TOPICS=debug,test
# Optional: Topic forwarding (preserve main topic alongside processed results)
FILTER_FORWARD_MAIN=false
# Optional: No-ops mode (skip API calls for testing)
FILTER_NO_OPS=false
Configuration Matrix
| Variable | Type | Default | Required | Notes |
|---|---|---|---|---|
chatgpt_model |
string | "gpt-4o-mini" | Yes | Model name |
chatgpt_api_key |
string | "" | Yes | API key |
prompt |
string | "" | Yes | Path to prompt file (.txt) |
output_schema |
dict | {} | No | Defines expected labels and defaults |
max_tokens |
int | 1000 | No | Max response tokens |
temperature |
float | 0.1 | No | Controls randomness |
max_image_size |
int | 0 | No | Max image size (0 = keep original) |
image_quality |
int | 85 | No | JPEG quality (1-100) |
save_frames |
bool | true | No | Save JSON per frame |
output_dir |
string | "./output_frames" | No | Where to save JSON output |
forward_main |
bool | false | No | Forward main topic to output |
no_ops |
bool | false | No | Skip API calls for testing |
confidence_threshold |
float | 0.9 | No | Confidence threshold for positive classification (0.0-1.0) |
Usage
No-Ops Mode (Testing)
For testing and development, you can enable no-ops mode to skip API calls:
# Enable no-ops mode
export FILTER_NO_OPS=true
# Run the filter (will skip API calls and use default annotations)
python scripts/filter_annotation_batch.py
In no-ops mode:
- ✅ Images are still processed and resized
- ✅ JSON files are still generated with default annotations
- ✅ Binary datasets are still created on shutdown
- ❌ No API calls are made to ChatGPT
- ❌ No API costs are incurred
This is useful for:
- Testing the pipeline without API costs
- Validating image processing and file generation
- Development and debugging
Image Size Configuration
The max_image_size parameter controls image resizing for API cost optimization:
# Keep original image size (highest quality, highest cost)
export FILTER_MAX_IMAGE_SIZE=0
# Resize to 512px (good quality, moderate cost)
export FILTER_MAX_IMAGE_SIZE=512
# Resize to 256px (lower quality, lowest cost)
export FILTER_MAX_IMAGE_SIZE=256
Cost Impact:
0(original): ~$0.15/image (high quality)512px: ~$0.01/image (good quality)256px: ~$0.005/image (lower quality)
Topic Forwarding Configuration
The forward_main parameter controls whether the main topic from input frames is forwarded to the output:
# Forward main topic to preserve original frame (recommended for pipelines)
export FILTER_FORWARD_MAIN=true
# Don't forward main topic (only processed results)
export FILTER_FORWARD_MAIN=false
Use Cases:
- Pipeline Processing: When you want to preserve the original main frame for downstream filters
- Multi-topic Processing: When processing specific topics but want to keep the main frame intact
- Data Preservation: When you need both processed results and original frame data
Output Behavior:
- With
forward_main=True: Output includes both processed topics and the original main topic - With
forward_main=False: Output includes only processed topics
Example Output Structure:
# With forward_main=True
{
"main": Frame(original_image, original_data, "BGR"), # Original main frame
"processed_topic_1": Frame(image, results_metadata, "BGR"), # Processed frame
"processed_topic_2": Frame(image, results_metadata, "BGR") # Processed frame
}
# With forward_main=False
{
"processed_topic_1": Frame(image, results_metadata, "BGR"), # Processed frame
"processed_topic_2": Frame(image, results_metadata, "BGR") # Processed frame
}
Save Frames Configuration
The save_frames parameter controls whether to save individual JSON files:
# Save JSON files (default - recommended)
export FILTER_SAVE_FRAMES=true
# Don't save files (only show in web interface)
export FILTER_SAVE_FRAMES=false
Benefits of saving frames:
- ✅ Processed images - Images saved in
data/subfolder with unique names - ✅ JSONL dataset - Results saved in dataset_langchain format
- ✅ Binary datasets - Automatically generated for ML training
- ✅ Debugging - Can inspect individual frame results and images
- ✅ Batch processing - Results available after pipeline ends
When to disable:
- Quick testing without file clutter
- Web visualization only
- Temporary analysis
Confidence Threshold Configuration
The confidence_threshold parameter controls the minimum confidence score required to classify an item as "present" in the generated datasets:
# Default: 90% confidence required
export FILTER_CONFIDENCE_THRESHOLD=0.9
# More lenient: 70% confidence required
export FILTER_CONFIDENCE_THRESHOLD=0.7
# Very strict: 95% confidence required
export FILTER_CONFIDENCE_THRESHOLD=0.95
How it works:
- Confidence ≥ threshold → Item classified as PRESENT (positive class)
- Confidence < threshold → Item classified as ABSENT (negative class)
Examples:
{
"avocado": {
"present": true,
"confidence": 0.92 // ✅ 92% ≥ 90% → "avocado" (with threshold=0.9)
},
"tomato": {
"present": true,
"confidence": 0.85 // ❌ 85% < 90% → "absent" (with threshold=0.9)
}
}
Recommended values:
- 0.9 (90%) - Default, high precision
- 0.8 (80%) - Balanced precision/recall
- 0.7 (70%) - Higher recall, more lenient
- 0.95 (95%) - Very high precision, strict
Output Structure
When save_frames=true, the following structure is created:
./output_frames/
├── data/ # Processed images subfolder
│ ├── 0_1758035382121.jpg # Frame 0 with timestamp
│ ├── 1_1758035382122.jpg # Frame 1 with timestamp
│ └── 2_1758035382123.jpg # Frame 2 with timestamp
├── labels.jsonl # One JSON line per frame (see docs/output_contract.md)
└── binary_datasets/ # Generated automatically on shutdown (overwrites existing)
├── item1_labels.json
├── item2_labels.json
├── item3_labels.json
├── item4_labels.json
└── _summary_report.json
└── binary_datasets_balanced/ # Balanced datasets (equal class representation)
├── item1_labels.json
├── item2_labels.json
├── item3_labels.json
├── item4_labels.json
└── _summary_report.json # Summary report (highlighted with underscore)
└── multilabel_datasets/ # When multiple labels in schema: COCO-style annotations.json
├── annotations.json
└── _summary_report.json
Important Notes:
- Binary datasets are overwritten on each run to ensure they reflect the latest processing results
- Images are saved incrementally during processing (append mode)
- JSONL file is appended during processing, not overwritten
- Summary report is regenerated on each shutdown
- Balanced datasets are generated automatically
Basic Pipeline
Run the complete annotation pipeline:
python scripts/filter_food_annotation.py
This will:
- Load video from
VIDEO_PATHenvironment variable - Process frames with ChatGPT Vision API using the specified prompt
- Display results in web interface at
http://localhost:8000
Using Makefile
# Run with example video
make run-example
# Run with custom video
VIDEO_PATH=/path/to/video.mp4 make run-custom
# Check environment
make check-env
# Run tests
make test
Usage Scenarios
1. Example Dataset (Food Analysis)
Detect items with confidence levels (example):
export FILTER_PROMPT="./prompts/food_annotation_prompt.txt"
export FILTER_OUTPUT_SCHEMA='{"lettuce": {"present": false, "confidence": 0.0}, "tomato": {"present": false, "confidence": 0.0}}'
python scripts/filter_food_annotation.py
2. Pet Classification
Detect presence of cats/dogs:
export FILTER_PROMPT="./prompts/pet_classification_prompt.txt"
export FILTER_OUTPUT_SCHEMA='{"cat": {"present": false, "confidence": 0.0}, "dog": {"present": false, "confidence": 0.0}}'
python scripts/filter_pet_classification.py
3. Medical Imaging
Detect medical conditions (research/educational only):
export FILTER_PROMPT="./prompts/medical_imaging_prompt.txt"
export FILTER_OUTPUT_SCHEMA='{"tumor": {"present": false, "confidence": 0.0}, "calcification": {"present": false, "confidence": 0.0}}'
python scripts/filter_medical_imaging.py
4. Industrial Quality
Detect defects in assembly line images:
export FILTER_PROMPT="./prompts/industrial_quality_prompt.txt"
export FILTER_SAVE_FRAMES="true"
export FILTER_OUTPUT_DIR="./quality_results"
python scripts/filter_industrial_quality.py
5. Pipeline Integration with Topic Forwarding
Preserve main topic for downstream processing:
export FILTER_PROMPT="./prompts/annotation_prompt.txt"
export FILTER_FORWARD_MAIN="true" # Preserve main topic
export FILTER_OUTPUT_SCHEMA='{"item1": {"present": false, "confidence": 0.0}, "item2": {"present": false, "confidence": 0.0}}'
python scripts/filter_annotation.py
This configuration ensures that:
- The original main frame is preserved for downstream filters
- Processed results are available alongside the original data
- Pipeline compatibility is maintained
Prompt Format & Importance
The prompt format is critical for annotation quality. Prompts must:
- Define the exact list of items to check
- Enforce output as strict JSON only (no extra text)
- Provide clear rules for uncertainty and confidence scoring
Example Prompt (Generic Dataset)
You are a vision analyst. Given an image, determine whether each of the following items is visibly present.
Return ONLY valid JSON with keys: "present" (boolean) and "confidence" (0-1).
ITEMS = ["item1", "item2", "item3", "item4", "item5", ...]
Example Prompt (Pets Dataset)
You are a vision analyst. Given an image, determine whether it contains a cat or a dog.
Return ONLY valid JSON with:
{
"cat": {"present": <true|false>, "confidence": <0-1>},
"dog": {"present": <true|false>, "confidence": <0-1>}
}
Rules:
- If unsure, set present=false and confidence ≤0.3.
- Base decision only on visible image content.
Standard Output Format
All annotations follow this standardized format:
{
"item_name": {
"present": true|false,
"confidence": 0.0-1.0
}
}
Example saved JSONL line
{
"schema_version": "1.0",
"image": "001.png",
"labels": {
"cat": {"present": true, "confidence": 0.92},
"dog": {"present": false, "confidence": 0.15}
},
"usage": {
"input_tokens": 26288,
"output_tokens": 414,
"total_tokens": 26702
},
"prompt_used": "pet_classification_prompt.txt"
}
Full contract: docs/output_contract.md.
Available Scripts
The scripts/ directory contains example implementations for different use cases:
filter_food_annotation.py: Example food item detectionfilter_pet_classification.py: Cat/dog classificationfilter_medical_imaging.py: Medical image analysis (research only)filter_industrial_quality.py: Quality inspection and defect detection
See scripts/README.md for detailed usage instructions.
Cost Optimization
Image Processing
- Resize Images: Use
FILTER_MAX_IMAGE_SIZE=256for faster processing - Quality Settings: Lower
FILTER_IMAGE_QUALITYto reduce token usage - Model Selection: Use
gpt-4o-minifor cost-effective processing
Token Management
- Token Limits: Reduce
FILTER_MAX_TOKENSfor simpler tasks - Prompt Optimization: Keep prompts concise and focused
- Batch Processing: Process multiple frames efficiently
Development
Project Structure
filter-chatgpt-annotator/
├── filter_chatgpt_annotator/
│ └── filter.py # Main filter implementation
├── scripts/ # Example usage scripts
│ ├── filter_food_annotation.py
│ ├── filter_pet_classification.py
│ ├── filter_medical_imaging.py
│ ├── filter_industrial_quality.py
│ └── README.md
├── prompts/ # Example prompt files
│ ├── food_annotation_prompt.txt
│ ├── pet_classification_prompt.txt
│ ├── medical_imaging_prompt.txt
│ └── industrial_quality_prompt.txt
├── tests/ # Test files
├── env.example # Environment configuration example
└── pyproject.toml # Project dependencies
Key Dependencies
openai>=1.0.0- ChatGPT Vision API clientopenfilter[all]>=0.1.0- Filter frameworkopencv-python>=4.8.0- Image processingpillow>=9.0.0- Image manipulationpython-dotenv>=1.0.0- Environment configuration
Testing
# Run tests
make test
# Run tests with coverage
make test-cov
# Check code quality
make lint
# Format code
make format
Troubleshooting
API Key Issues
If you get API key errors:
- Check that
FILTER_CHATGPT_API_KEYis set correctly in.env - Verify your OpenAI API key is valid and has sufficient credits
- Ensure the key has access to the Vision API
Prompt File Not Found
If you get prompt file errors:
- Check that
FILTER_PROMPTpoints to an existing file - Verify the prompt file contains valid text
- Ensure the prompt returns valid JSON format
JSON Parse Errors
If ChatGPT returns invalid JSON:
- Review your prompt to ensure it enforces JSON-only output
- Add validation rules in the prompt
- Check the filter logs for the raw response
Performance Issues
If processing is slow:
- Reduce
FILTER_MAX_IMAGE_SIZEto 256 or 128 - Lower
FILTER_IMAGE_QUALITYto 70-80 - Use
gpt-4o-miniinstead ofgpt-4o - Reduce
FILTER_MAX_TOKENSfor simpler tasks
Cost Optimization
To reduce API costs:
- Use smaller image sizes (
FILTER_MAX_IMAGE_SIZE=256) - Lower image quality (
FILTER_IMAGE_QUALITY=70) - Optimize prompts to be more concise
- Use
gpt-4o-minimodel - Set appropriate token limits
Open Questions & Next Steps
- Should the filter enforce JSON Schema validation instead of simple type casting?
- Should prompts be standardized into a prompt library by domain?
- Should batch multi-image requests be supported for efficiency?
- What metrics (tokens, cost, latency) should be exposed for monitoring?
- Should we allow provider abstraction (Gemini, Claude) in the next iteration?
Documentation
For more detailed information, configuration examples, and advanced usage scenarios, see the comprehensive documentation.
License
See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filter_chatgpt_annotator-0.2.3-py3-none-any.whl.
File metadata
- Download URL: filter_chatgpt_annotator-0.2.3-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ea45466d25b96bf75575c61cca5ef0091018b28ad69cc683c303f3f67a5dbe6
|
|
| MD5 |
2aadddc0a6e4acfabf636699be857ec2
|
|
| BLAKE2b-256 |
d6e2508773e7df9a2a46b9978728116c962b668b37f07824e6156e1802b3edb6
|