Automated mail intelligence pipeline: Gmail → image extraction → Gemini AI analysis
Project description
mailsense
Automated mail intelligence pipeline. Gmail →
.mbox→ image extraction → Gemini AI analysis — all from one CLI.
What is mailsense?
mailsense turns your USPS Informed Delivery emails (or any image-heavy Gmail label) into clean, structured JSON — automatically.
The pipeline has three stages:
The pipeline has three stages:
Gmail label
│
▼ mailsense download
.mbox files
│
▼ mailsense extract
images + metadata.json
│
▼ mailsense analyze
structured JSON (sender, recipient, postage, document type, summary…)
Run each stage independently, or fire them all at once with mailsense pipeline.
Installation
From PyPI
pip install mailsense
From source
git clone https://github.com/samapriya/mailsense.git
cd mailsense
pip install -e .
Build a wheel for local distribution
pip install build
python -m build # produces dist/mailsense-*.whl and dist/mailsense-*.tar.gz
pip install dist/mailsense-*.whl
Prerequisites
| Requirement | How to get it |
|---|---|
| Python 3.10+ | python.org |
| Gmail IMAP enabled | Gmail → Settings → See all settings → Forwarding and POP/IMAP → Enable IMAP |
| Gmail App Password | myaccount.google.com → Security → 2-Step Verification → App Passwords |
| Gemini API key | aistudio.google.com/app/apikey — free tier: 15 RPM / 1500 RPD |
Quick Start
1. Store your credentials once
Run the interactive setup wizard — it walks through every setting, masks sensitive input with * as you type (paste works too), and lets you press Enter to keep the current or default value:
mailsense config configure
Or set values individually:
mailsense config set gmail_email you@gmail.com
mailsense config set gmail_password YOUR_APP_PASSWORD
mailsense config set gmail_label "USPS Informed Delivery"
mailsense config set gemini_api_key YOUR_GEMINI_KEY
Credentials are stored in ~/.mailsense (mode 0600, owner-readable only).
2. Run the full pipeline
# If gmail_label is saved in config, --label can be omitted:
mailsense pipeline --work-dir ./my_mail
# Or specify the label explicitly:
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./my_mail
Results land in:
my_mail/
mbox/ ← grouped .mbox files from Gmail
images/ ← extracted mail images + metadata.json
analyzed/ ← one .json per image with AI-extracted data
Commands
mailsense config
Manage credentials and defaults stored in ~/.mailsense.
# Interactive wizard — prompts for all settings with masked input for secrets
mailsense config configure
# Reconfigure specific keys only
mailsense config configure gmail_email gmail_label
# Show current config (secrets masked)
mailsense config show
# Set a single value
mailsense config set gmail_label "USPS Informed Delivery"
mailsense config set api_delay 2
# Remove a value
mailsense config unset gmail_password
# List all recognised keys and descriptions
mailsense config keys
Available config keys:
| Key | Description |
|---|---|
gmail_email |
Gmail address used for IMAP |
gmail_password |
Gmail App Password |
gmail_label |
Gmail label to download (e.g. USPS Informed Delivery) |
gemini_api_key |
Google Gemini API key |
sender_filter |
From-header substring filter used during extraction (default: usps) |
gemini_model |
Gemini model name (default: gemini-2.0-flash) |
api_delay |
Seconds between Gemini requests (default: 4) |
gmail_labelvssender_filter:gmail_labelis the Gmail folder you download from.sender_filteris a substring matched against theFrom:header of each email within those downloads to identify the right messages — the defaultuspsmatches addresses likeinformeddelivery@usps.com. You rarely need to changesender_filter.
mailsense download
Download emails from a Gmail label to .mbox files.
# Uses gmail_label from config if --label is not specified
mailsense download --output-dir ./mbox
# Override the label explicitly
mailsense download --label "USPS Informed Delivery" --output-dir ./mbox
# Last 90 days only, grouped by week
mailsense download --start 90d --group-by week
# Specific date range, 14-day chunks
mailsense download \
--start 01-01-2025 \
--end 12-31-2025 \
--group-by days \
--days-per-file 14
# List all available Gmail labels
mailsense download --list-labels
Options:
| Flag | Default | Description |
|---|---|---|
--label, -l |
config gmail_label / prompted |
Gmail label to download |
--email, -e |
config gmail_email / prompted |
Gmail address |
--password, -p |
config gmail_password / prompted (masked) |
App Password |
--output-dir, -o |
mbox_export |
Directory for .mbox files |
--start |
— | Start date (MM-DD-YYYY, YYYY-MM-DD, or 90d) |
--end |
— | End date (inclusive) |
--group-by |
month |
month / week / days / single / individual |
--days-per-file |
7 |
Window size when --group-by=days |
--list-labels |
— | Print all Gmail labels and exit |
--no-resume |
— | Re-download everything, ignoring previous state |
Date formats: 02-25-2026 · 2026-02-25 · 02/25/2026 · 90d (relative)
mailsense extract
Extract images from .mbox files, writing metadata.json.
# Single .mbox file
mailsense extract --input-dir inbox.mbox --output-dir ./images
# Entire folder of .mbox files (batch mode — one subdirectory per mbox)
mailsense extract --input-dir ./mbox_export --output-dir ./images
# Dry run — scan without writing
mailsense extract --input-dir ./mbox_export --dry-run
Options:
| Flag | Default | Description |
|---|---|---|
--input-dir, -i |
required | .mbox file or directory of .mbox files |
--output-dir, -o |
img_extracts |
Root output directory |
--dry-run, -n |
— | Scan without writing |
--log-level |
INFO |
DEBUG / INFO / WARNING / ERROR |
--log-file |
— | Write full debug log to a file |
The sender filter is read from config (sender_filter, default usps) and applied automatically — no flag needed.
Image filtering rules (applied in order):
- Extension must be
.jpg.jpeg.png.gif.webp.bmp.tiff - Filename must not start with
content,mailer, orra(email chrome/trackers) - File must be ≥ 1 KB (eliminates tracking pixels)
mailsense analyze
Analyze mail images with Gemini AI, producing structured JSON.
mailsense analyze \
--input-dir ./images \
--output-dir ./analyzed
# Custom model and delay (paid tier — faster)
mailsense analyze \
--input-dir ./images \
--output-dir ./analyzed \
--model gemini-1.5-pro \
--delay 1
# Dry run
mailsense analyze --input-dir ./images --output-dir ./analyzed --dry-run
Options:
| Flag | Default | Description |
|---|---|---|
--input-dir, -i |
required | Output directory from the extract stage |
--output-dir, -o |
required | Directory for analyzed JSON files |
--api-key, -k |
config gemini_api_key / env |
Gemini API key |
--model, -m |
config gemini_model (gemini-2.0-flash) |
Gemini model |
--delay, -d |
config api_delay (4) |
Seconds between API calls |
--dry-run, -n |
— | Show what would be processed |
Rate limits:
| Tier | RPM | RPD | Recommended --delay |
|---|---|---|---|
| Free | 15 | 1500 | 4 (default) |
| Paid | 1000+ | — | 1 or lower |
Output format (per image):
{
"status": "Processed",
"is_marketing": false,
"sender": {
"name": "Capital One",
"organization": "Capital One Bank",
"address": { "street": "PO Box 30281", "city": "Salt Lake City", "state": "UT", "zip_code": "84130" }
},
"recipient": { "name": "Jane Smith", "address": { ... } },
"postage_details": { "type": "First Class Mail", "status": "Delivered" },
"document_info": { "document_type": "Credit Card Statement", "reference_numbers": ["..."] },
"content_summary": "Monthly credit card statement showing account balance and minimum payment due.",
"filename": "image_3_a1b2c3d4.jpg",
"mail_metadata": {
"date": "Thu, 12 Feb 2026 08:00:00 -0500",
"subject": "Your USPS Informed Delivery Daily Digest",
"from": "USPS Informed Delivery <InformedDelivery@informeddelivery.usps.com>"
}
}
mailsense pipeline
Run all three stages end-to-end from a single command.
# Uses gmail_label from config — no --label needed
mailsense pipeline --work-dir ./mail_run
# Override the label explicitly
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./mail_run
# Last 90 days, skip re-downloading if mbox already exists
mailsense pipeline \
--start 90d \
--work-dir ./mail_run \
--skip-download
# Dry run of the entire pipeline
mailsense pipeline --dry-run
Workflow output structure:
mail_run/
mbox/ ← grouped .mbox files
images/ ← images + metadata.json per mbox
analyzed/ ← one .json per image
Resume behaviour: Each stage is individually resumable — already-processed files are skipped automatically on re-runs. Use --no-resume to force a clean download, or --skip-download / --skip-extract / --skip-analyze to selectively re-run only the stages you need.
All options:
| Flag | Default | Description |
|---|---|---|
--work-dir, -w |
mailsense_run |
Root directory for all pipeline outputs |
--label, -l |
config gmail_label / prompted |
Gmail label to download |
--email, -e |
config gmail_email / prompted |
Gmail address |
--password, -p |
config gmail_password / prompted |
App Password |
--start |
— | Start date |
--end |
— | End date (inclusive) |
--group-by |
month |
.mbox grouping strategy |
--days-per-file |
7 |
Window size when --group-by=days |
--no-resume |
— | Re-download everything |
--api-key, -k |
config / env | Gemini API key |
--model |
config / gemini-2.0-flash |
Gemini model |
--delay |
config / 4 |
Seconds between Gemini requests |
--dry-run, -n |
— | Dry run all stages |
--skip-download |
— | Skip download, use existing .mbox files |
--skip-extract |
— | Skip extract, use existing images |
--skip-analyze |
— | Skip analyze |
Building for Distribution
# Install build tools
pip install build twine
# Build sdist + wheel
python -m build
# Inspect the wheel contents
python -m zipfile -l dist/mailsense-0.1.0-py3-none-any.whl
# Upload to PyPI
twine upload dist/*
# Upload to TestPyPI first (recommended)
twine upload --repository testpypi dist/*
Project Structure
mailsense/
├── mailsense/
│ ├── __init__.py # version, metadata
│ ├── cli.py # argparse root + dispatcher
│ ├── config.py # ~/.mailsense read/write + interactive wizard
│ └── commands/
│ ├── __init__.py
│ ├── config_cmd.py # mailsense config
│ ├── download.py # mailsense download — Gmail IMAP → .mbox
│ ├── extract.py # mailsense extract — .mbox → images
│ ├── analyze.py # mailsense analyze — images → JSON
│ └── pipeline.py # mailsense pipeline — end-to-end
├── tests/
│ └── test_config.py
├── pyproject.toml
├── LICENSE
└── README.md
Environment Variables
| Variable | Equivalent config key |
|---|---|
GEMINI_API_KEY |
gemini_api_key |
CLI flags always take precedence over both config file and environment variables.
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes and add tests
- Run tests:
pytest - Submit a pull request
License
Copyright 2026 Samapriya Roy. Licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mailsense-0.1.0.tar.gz.
File metadata
- Download URL: mailsense-0.1.0.tar.gz
- Upload date:
- Size: 30.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ce458eab7895a150a8bdb50881eecb8fbbb57fabe35ab1b772d559a8895d05f
|
|
| MD5 |
75f408895bc22d0f8ca95c9f7a5baa91
|
|
| BLAKE2b-256 |
900b23e5c7002f0bccfdb43f76f63252502d5c1bfc00843cae6084d8f8b7ad02
|
File details
Details for the file mailsense-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mailsense-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccfdc3b521a1f3acc7f5f56e4867dc565cfd6abf84aa8c1b295b6f7e65872792
|
|
| MD5 |
5268c786ab2bdea83fcf5aaeca74dd51
|
|
| BLAKE2b-256 |
fc1159979b060249f6faf3498206357810433b62ba1a7485e7195b6ddbcf200e
|