Skip to main content

A Python-based command-line system for managing a philatelic (stamp) collection.

Project description

Philately Collection Management System

Save days or weeks of tedious data entry with tweezers and a magnifying glass.

Overview

This project is a Python-based command-line system for managing a philatelic (stamp) collection. It leverages modern AI to process entire directories of stamp album images, extract detailed metadata, and generate a comprehensive, queryable inventory. By using litellm, it supports multiple AI model providers (e.g., Google Gemini, xAI Grok) for maximum flexibility and cost-effectiveness.

Key features include:

  • Multi-Model AI Processing: Analyzes stamp images to extract details like country, year, and condition using a two-pass system with configurable "low-cost" and "high-cost" vision models.
  • Data Enrichment: Uses powerful text models to enrich the initial data with estimated values, historical context, and philatelic remarks.
  • False Positive Detection: Includes a dedicated phase to re-examine high-value items and automatically flag illustrations or other non-stamp entities.
  • Persistent, Auditable Storage: Maintains a master inventory in master_inventory.csv that includes all processed stamps, deacquired items, and verification results.
  • Comprehensive Reporting: Generates detailed JSON summaries, high-value reports, and content-ready CSVs for platforms like Substack.
  • Modular, Phase-Based Execution: Allows you to run the entire pipeline or specific phases (e.g., analysis, enrichment, reporting) independently.
  • Command-Line and GUI Interfaces: Provides both a command-line tool (philately) for automated processing and a Streamlit-based GUI (philately-ui) for interactive use.

Prerequisites

  • Python: Version 3.13 or higher.
  • API Keys: At least one API key for a supported provider (e.g., Google, xAI). These should be set in a .env file.
  • System Dependencies:
    • On Ubuntu/Debian: sudo apt-get install libopencv-dev.
    • On macOS: brew install opencv.

Installation

  1. Clone the Repository:

    git clone <repository-url>
    cd <repository-directory>
    
  2. Create a Virtual Environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install the Package:

    pip install .
    
  4. Set Up Environment Variables:

    Create a .env file in the project root and add your API key(s):

    echo "GOOGLE_API_KEY=your-google-api-key" > .env
    echo "XAI_API_KEY=your-xai-api-key" >> .env
    
  5. Prepare Directory Structure:

    • Place stamp images in a directory (e.g., stamps/), organized into subdirectories for each album (e.g., stamps/Isle of Man/).
    • The output directory will be created automatically to store all generated files.

Usage

This package provides two primary entry points: a command-line interface (CLI) and a graphical user interface (GUI).

Command-Line Interface (CLI)

The philately command allows you to run the entire pipeline or specific phases using command-line flags.

Command-Line Flags

The command-line flags are the same as described in the table below.

Graphical User Interface (GUI)

The philately-ui command launches a Streamlit-based web interface that allows you to configure and run the processing pipeline interactively.

philately-ui

Example Commands (CLI)

1. Run the full pipeline on all images:

philately --image-dir ./stamps --output-dir ./output

2. Run only the image analysis phase on the first 10 images:

philately --run-analysis --max-images 10

3. Run the false-positive check on the top 3 most valuable stamps with debug logging:

philately --run-false-positive-check --false-positive-check-limit 3 --debug

4. Generate a Substack export with the top 20 most valuable items:

philately --run-substack-export --substack-items 20

5. Re-run only the enrichment and summary phases:

philately --run-enrichment --run-summaries

Command-Line Flags

Flag Default Description
--image-dir stamps Directory containing stamp images organized in album folders.
--output-dir output Directory to save all outputs.
--confidence-threshold 5 Confidence score (1-7) below which to trigger re-analysis with a high-cost model.
--max-images None Limit the number of images to process for testing.
--high-value-threshold 1000 USD threshold to consider a stamp as high-value for reporting.
--debug False Enable debug-level logging for verbose output, including API payloads.
--low-cost-model gemini/gemini-1.5-flash-latest The vision model for the initial, low-cost pass.
--high-cost-model gemini/gemini-1.5-pro-latest The vision model for the high-confidence re-analysis pass.
--narrative-model gemini/gemini-1.5-pro-latest The text model for enrichment and summaries.
--collection-summary-model gemini/gemini-1.5-pro-latest The high-context model for the final collection-wide summary.
--run-analysis False Run only the image analysis phase.
--run-enrichment False Run only the philatelic enrichment phase.
--run-summaries False Run the full clustering and summary phase.
--run-high-value-report False Run only the high-value stamp report generation phase.
--run-collection-summary-only False Run only the final collection-wide summary generation.
--run-false-positive-check False Run a re-examination of high-value stamps to find false positives.
--false-positive-check-limit 5 Limit the number of stamps to check in the false-positive phase (0 for all).
--run-substack-export False Generate a CSV export formatted for Substack posts.
--substack-items 10 Number of top items to include in the Substack export (0 for all).

Output Files

All outputs are saved to the directory specified by --output-dir.

  • master_inventory.csv: The master database of all stamps, including detailed analysis and verification data.
  • stamp_inventory.json: A structured JSON file containing all data, including collection-wide statistics and narrative summaries.
  • false_positive_check_report.csv: A summary of high-value items that were checked for authenticity.
  • high_value_summary.csv: A CSV listing all stamps identified as high-value.
  • substack_export.csv: A CSV formatted for easy import into content platforms like Substack.
  • cropped_entities/: Directory of cropped images for each identified stamp.
  • thumbnails/: Directory of 100x100px thumbnails for each stamp.
  • high_value_reports/: Individual Markdown reports for each high-value stamp.

Example Data Records

1. Master Inventory Record (master_inventory.csv)

A single row contains the complete data for one stamp.

stamp_id album page_filename common_name nationality year face_value condition confidence estimated_value_high is_verified_real verification_reason
a1b2c3d4-... Isle of Man IMG_1172.jpeg 1973 Manx Cat Isle of Man 1973 3p Mint 7 15 True This appears to be a genuine, mounted stamp with clear perforations and color.

2. Cluster Summary (stamp_inventory.json)

Summaries provide statistics and a narrative for a specific group of stamps (e.g., an album).

{
    "album_Isle_of_Man": {
        "statistics": {
            "item_count": 58,
            "album_count": 1,
            "countries_represented": 1,
            "year_range": "1973 - 1998",
            "total_value_low": 150,
            "total_value_high": 450,
            "condition_distribution": {
                "Mint": 45,
                "Used": 13
            }
        },
        "narrative_summary": "This cluster from the 'Isle of Man' album represents a strong collection of modern issues, primarily from the 1970s and 1980s. The thematic focus is on local culture, transportation, and fauna, with the 'Manx Cat' and 'TT Races' series being prominent highlights. The overall condition is excellent, with a majority of items in mint condition. A notable gap is the absence of earlier Victorian-era issues."
    }
}

3. False Positive Check Report (false_positive_check_report.csv)

This report provides a clear audit trail for the verification process.

stamp_id common_name estimated_value_high page_filename is_verified_real cropped_image_path verification_reason action_taken
e5f6g7h8-... Penny Black 2500 IMG_1245.JPG False cropped_entities/e5f6g7h8-..._cropped.jpg The image is a black and white printed illustration, lacking color and physical depth. Marked as deacquired (illustration)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philately_will_get_you_everywhere-0.1.0.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file philately_will_get_you_everywhere-0.1.0.tar.gz.

File metadata

File hashes

Hashes for philately_will_get_you_everywhere-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8df3a029a6a61c5af212d55393d714965bf4a08ad45e6133dfd9e3c17aa45200
MD5 830b34a446699fb095aeb4c2ccaa9712
BLAKE2b-256 188615b496056169408771cfd0063b8cc7af5ea4095295bce36748111797abe6

See more details on using hashes here.

File details

Details for the file philately_will_get_you_everywhere-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for philately_will_get_you_everywhere-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44ec7bd7c5635ab418eb6ff865b0d062be2c51ee93756ccce2dd5e2e8d58382f
MD5 3ccbc030976709f296b7a4c3c9292de8
BLAKE2b-256 c48d719009a0757d1b53a4752cb9a929448fa65f3379b867cb11c826a40e4e2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page