Skip to main content

Automated mail intelligence pipeline: Gmail → image extraction → Gemini AI analysis

Project description

mailsense

Automated mail intelligence pipeline. Gmail → .mbox → image extraction → Gemini AI analysis — all from one CLI.

PyPI version Python 3.10+ License: Apache 2.0


mailsense icon

What is mailsense?

mailsense turns your USPS Informed Delivery emails (or any image-heavy Gmail label) into clean, structured JSON — automatically.

The pipeline has three stages:

The pipeline has three stages:

Gmail label
    │
    ▼  mailsense download
.mbox files
    │
    ▼  mailsense extract
images + metadata.json
    │
    ▼  mailsense analyze
structured JSON (sender, recipient, postage, document type, summary…)

Run each stage independently, or fire them all at once with mailsense pipeline.


Installation

From PyPI

pip install mailsense

From source

git clone https://github.com/samapriya/mailsense.git
cd mailsense
pip install -e .

Build a wheel for local distribution

pip install build
python -m build          # produces dist/mailsense-*.whl and dist/mailsense-*.tar.gz
pip install dist/mailsense-*.whl

Prerequisites

Requirement How to get it
Python 3.10+ python.org
Gmail IMAP enabled Gmail → Settings → See all settings → Forwarding and POP/IMAP → Enable IMAP
Gmail App Password myaccount.google.com → Security → 2-Step Verification → App Passwords
Gemini API key aistudio.google.com/app/apikey — free tier: 15 RPM / 1500 RPD

Quick Start

1. Store your credentials once

Run the interactive setup wizard — it walks through every setting, masks sensitive input with * as you type (paste works too), and lets you press Enter to keep the current or default value:

mailsense config configure

Or set values individually:

mailsense config set gmail_email     you@gmail.com
mailsense config set gmail_password  YOUR_APP_PASSWORD
mailsense config set gmail_label     "USPS Informed Delivery"
mailsense config set gemini_api_key  YOUR_GEMINI_KEY

Credentials are stored in ~/.mailsense (mode 0600, owner-readable only).

2. Run the full pipeline

# If gmail_label is saved in config, --label can be omitted:
mailsense pipeline --work-dir ./my_mail

# Or specify the label explicitly:
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./my_mail

Results land in:

my_mail/
  mbox/          ← grouped .mbox files from Gmail
  images/        ← extracted mail images + metadata.json
  analyzed/      ← one .json per image with AI-extracted data

Commands

mailsense config

Manage credentials and defaults stored in ~/.mailsense.

# Interactive wizard — prompts for all settings with masked input for secrets
mailsense config configure

# Reconfigure specific keys only
mailsense config configure gmail_email gmail_label

# Show current config (secrets masked)
mailsense config show

# Set a single value
mailsense config set gmail_label     "USPS Informed Delivery"
mailsense config set api_delay       2

# Remove a value
mailsense config unset gmail_password

# List all recognised keys and descriptions
mailsense config keys

Available config keys:

Key Description
gmail_email Gmail address used for IMAP
gmail_password Gmail App Password
gmail_label Gmail label to download (e.g. USPS Informed Delivery)
gemini_api_key Google Gemini API key
sender_filter From-header substring filter used during extraction (default: usps)
gemini_model Gemini model name (default: gemini-2.0-flash)
api_delay Seconds between Gemini requests (default: 4)

gmail_label vs sender_filter: gmail_label is the Gmail folder you download from. sender_filter is a substring matched against the From: header of each email within those downloads to identify the right messages — the default usps matches addresses like informeddelivery@usps.com. You rarely need to change sender_filter.


mailsense download

Download emails from a Gmail label to .mbox files.

# Uses gmail_label from config if --label is not specified
mailsense download --output-dir ./mbox

# Override the label explicitly
mailsense download --label "USPS Informed Delivery" --output-dir ./mbox

# Last 90 days only, grouped by week
mailsense download --start 90d --group-by week

# Specific date range, 14-day chunks
mailsense download \
  --start 01-01-2025 \
  --end   12-31-2025 \
  --group-by days \
  --days-per-file 14

# List all available Gmail labels
mailsense download --list-labels

Options:

Flag Default Description
--label, -l config gmail_label / prompted Gmail label to download
--email, -e config gmail_email / prompted Gmail address
--password, -p config gmail_password / prompted (masked) App Password
--output-dir, -o mbox_export Directory for .mbox files
--start Start date (MM-DD-YYYY, YYYY-MM-DD, or 90d)
--end End date (inclusive)
--group-by month month / week / days / single / individual
--days-per-file 7 Window size when --group-by=days
--list-labels Print all Gmail labels and exit
--no-resume Re-download everything, ignoring previous state

Date formats: 02-25-2026 · 2026-02-25 · 02/25/2026 · 90d (relative)


mailsense extract

Extract images from .mbox files, writing metadata.json.

# Single .mbox file
mailsense extract --input-dir inbox.mbox --output-dir ./images

# Entire folder of .mbox files (batch mode — one subdirectory per mbox)
mailsense extract --input-dir ./mbox_export --output-dir ./images

# Dry run — scan without writing
mailsense extract --input-dir ./mbox_export --dry-run

Options:

Flag Default Description
--input-dir, -i required .mbox file or directory of .mbox files
--output-dir, -o img_extracts Root output directory
--dry-run, -n Scan without writing
--log-level INFO DEBUG / INFO / WARNING / ERROR
--log-file Write full debug log to a file

The sender filter is read from config (sender_filter, default usps) and applied automatically — no flag needed.

Image filtering rules (applied in order):

  1. Extension must be .jpg .jpeg .png .gif .webp .bmp .tiff
  2. Filename must not start with content, mailer, or ra (email chrome/trackers)
  3. File must be ≥ 1 KB (eliminates tracking pixels)

mailsense analyze

Analyze mail images with Gemini AI, producing structured JSON.

mailsense analyze \
  --input-dir  ./images \
  --output-dir ./analyzed

# Custom model and delay (paid tier — faster)
mailsense analyze \
  --input-dir  ./images \
  --output-dir ./analyzed \
  --model gemini-1.5-pro \
  --delay 1

# Dry run
mailsense analyze --input-dir ./images --output-dir ./analyzed --dry-run

Options:

Flag Default Description
--input-dir, -i required Output directory from the extract stage
--output-dir, -o required Directory for analyzed JSON files
--api-key, -k config gemini_api_key / env Gemini API key
--model, -m config gemini_model (gemini-2.0-flash) Gemini model
--delay, -d config api_delay (4) Seconds between API calls
--dry-run, -n Show what would be processed

Rate limits:

Tier RPM RPD Recommended --delay
Free 15 1500 4 (default)
Paid 1000+ 1 or lower

Output format (per image):

{
  "status": "Processed",
  "is_marketing": false,
  "sender": {
    "name": "Capital One",
    "organization": "Capital One Bank",
    "address": { "street": "PO Box 30281", "city": "Salt Lake City", "state": "UT", "zip_code": "84130" }
  },
  "recipient": { "name": "Jane Smith", "address": { ... } },
  "postage_details": { "type": "First Class Mail", "status": "Delivered" },
  "document_info": { "document_type": "Credit Card Statement", "reference_numbers": ["..."] },
  "content_summary": "Monthly credit card statement showing account balance and minimum payment due.",
  "filename": "image_3_a1b2c3d4.jpg",
  "mail_metadata": {
    "date": "Thu, 12 Feb 2026 08:00:00 -0500",
    "subject": "Your USPS Informed Delivery Daily Digest",
    "from": "USPS Informed Delivery <InformedDelivery@informeddelivery.usps.com>"
  }
}

mailsense pipeline

Run all three stages end-to-end from a single command.

# Uses gmail_label from config — no --label needed
mailsense pipeline --work-dir ./mail_run

# Override the label explicitly
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./mail_run

# Last 90 days, skip re-downloading if mbox already exists
mailsense pipeline \
  --start    90d \
  --work-dir ./mail_run \
  --skip-download

# Dry run of the entire pipeline
mailsense pipeline --dry-run

Workflow output structure:

mail_run/
  mbox/         ← grouped .mbox files
  images/       ← images + metadata.json per mbox
  analyzed/     ← one .json per image

Resume behaviour: Each stage is individually resumable — already-processed files are skipped automatically on re-runs. Use --no-resume to force a clean download, or --skip-download / --skip-extract / --skip-analyze to selectively re-run only the stages you need.

All options:

Flag Default Description
--work-dir, -w mailsense_run Root directory for all pipeline outputs
--label, -l config gmail_label / prompted Gmail label to download
--email, -e config gmail_email / prompted Gmail address
--password, -p config gmail_password / prompted App Password
--start Start date
--end End date (inclusive)
--group-by month .mbox grouping strategy
--days-per-file 7 Window size when --group-by=days
--no-resume Re-download everything
--api-key, -k config / env Gemini API key
--model config / gemini-2.0-flash Gemini model
--delay config / 4 Seconds between Gemini requests
--dry-run, -n Dry run all stages
--skip-download Skip download, use existing .mbox files
--skip-extract Skip extract, use existing images
--skip-analyze Skip analyze

Building for Distribution

# Install build tools
pip install build twine

# Build sdist + wheel
python -m build

# Inspect the wheel contents
python -m zipfile -l dist/mailsense-0.1.0-py3-none-any.whl

# Upload to PyPI
twine upload dist/*

# Upload to TestPyPI first (recommended)
twine upload --repository testpypi dist/*

Project Structure

mailsense/
├── mailsense/
│   ├── __init__.py          # version, metadata
│   ├── cli.py               # argparse root + dispatcher
│   ├── config.py            # ~/.mailsense read/write + interactive wizard
│   └── commands/
│       ├── __init__.py
│       ├── config_cmd.py    # mailsense config
│       ├── download.py      # mailsense download — Gmail IMAP → .mbox
│       ├── extract.py       # mailsense extract — .mbox → images
│       ├── analyze.py       # mailsense analyze — images → JSON
│       └── pipeline.py      # mailsense pipeline — end-to-end
├── tests/
│   └── test_config.py
├── pyproject.toml
├── LICENSE
└── README.md

Environment Variables

Variable Equivalent config key
GEMINI_API_KEY gemini_api_key

CLI flags always take precedence over both config file and environment variables.


Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes and add tests
  4. Run tests: pytest
  5. Submit a pull request

License

Copyright 2026 Samapriya Roy. Licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mailsense-0.1.1.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mailsense-0.1.1-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file mailsense-0.1.1.tar.gz.

File metadata

  • Download URL: mailsense-0.1.1.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for mailsense-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8241d0463a47a1e3e70fb9b13f4b9a17701002893daead9954d6e979f52f376f
MD5 6a54ce2e854db1be6214600c66659942
BLAKE2b-256 a84bbfd08479fc2c2be8ce925371199737666243f915984083809b62700a4700

See more details on using hashes here.

File details

Details for the file mailsense-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mailsense-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for mailsense-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c616b9f49c822000b6414d03122980bba67391783cd704122ebe8c39f5daa6a8
MD5 5f74ca53330f15b2fc862cbd7197dbad
BLAKE2b-256 ead14dc5ea28f6d4d6fe98c83a9bea8c5590263b42855d8fce126347b4108f4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page