Skip to main content

Email-to-EML Secure Archiver

Project description

Email-to-EML Secure Archiver (EESA)

A Python-based command-line utility to programmatically retrieve emails from Gmail and Microsoft 365 and save them as RFC822-compliant .eml files.

โœจ Features

  • ๐Ÿ” Secure OAuth2 Authentication - Browser-based authentication with 2FA support
  • ๐Ÿ“ง Multi-Provider Support - Gmail and Microsoft 365 (Outlook)
  • ๐Ÿง  AI-Powered Classification - Automatically categorize emails and skip promotions (v0.3.0+)
  • ๐Ÿ“Š Advanced Extraction - Extract structured data like summaries, action items, and invoices (v0.5.0+)
  • ๐Ÿ–ฅ๏ธ Premium Web UI - Optional dashboard with high-end dark mode and real-time stats (v0.6.0+)
  • ๐Ÿ  Local LLM Support - Connect to Ollama, LM Studio, or llama.cpp (v0.4.0+)
  • ๐Ÿ” Advanced Filtering - Date-based, incremental sync, custom queries
  • ๐Ÿช Webhook Integration - Automatically send downloaded emails to webhook endpoints
  • ๐Ÿ’พ Incremental Checkpointing - Resume interrupted downloads
  • ๐Ÿ“ฆ Modern Package Management - UV/UVX support for easy installation and execution
  • ๐Ÿ›ก๏ธ Sandbox Support - Run in restricted/read-only environments using EESA_DATA_DIR (v0.8.3+)
  • ๐Ÿท๏ธ Smart Renaming & Embedding - Clean filenames and X-Header metadata for CRM integration (v0.8.4+)

๐Ÿš€ Quick Start

Installation

Using uvx (recommended - no installation needed):

# To run the CLI
uvx email-archiver --help

# To run the Web UI (installs optional dependencies automatically)
uvx --with email-archiver[ui] email-archiver --ui

Using pip:

# For CLI only
pip install email-archiver

# For CLI + Web UI
pip install "email-archiver[ui]"

From source:

git clone https://github.com/therealtimex/email-archiver
cd email-archiver
uv sync
uv run email-archiver --help

Basic Usage

# Launch the Web Dashboard
email-archiver --ui

# Download emails from Gmail since a specific date
email-archiver --provider gmail --since 2024-12-01

# Incremental sync (resume from last checkpoint)
email-archiver --provider gmail --incremental

# AI Classification (OpenAI)
email-archiver --provider gmail --classify --openai-api-key "sk-..." --skip-promotional

# AI Classification (Local LLM via Ollama)
email-archiver --provider gmail --classify --llm-provider ollama --model "llama3"

# With webhook integration
email-archiver --provider gmail --since 2024-12-23 \
  --webhook-url https://your-webhook.com/endpoint \
  --webhook-secret "Bearer your-token"

๐Ÿ“– Documentation

๐ŸŽฏ Common Use Cases

Daily Email Backup

email-archiver --provider gmail --incremental

Archive Specific Emails

# Emails with attachments
email-archiver --provider gmail --query "has:attachment" --since 2024-01-01

# From specific sender
email-archiver --provider gmail --query "from:important@example.com"

# Specific single email by ID
email-archiver --provider gmail --message-id 18e876a43b21

Webhook Integration

# Send emails to processing endpoint
email-archiver --provider gmail --incremental \
  --webhook-url https://api.example.com/emails \
  --webhook-secret "Bearer sk_live_abc123"

Custom Download Directory

# Save to specific folder
email-archiver --provider gmail --since 2024-12-01 \
  --download-dir /path/to/backup/emails

โš™๏ธ Configuration

Gmail Setup

  1. Create a project in Google Cloud Console
  2. Enable Gmail API
  3. Create OAuth 2.0 credentials (Desktop App)
  4. Save credentials as config/client_secret.json

Microsoft 365 Setup

  1. Register app in Azure Portal
  2. Add Mail.Read permission
  3. Update config/settings.yaml with your Client ID

See Quick Start Guide for detailed instructions.

๐Ÿช Webhook Integration

EESA can automatically POST downloaded .eml files to a webhook endpoint:

Via CLI:

email-archiver --provider gmail --since 2024-12-01 \
  --webhook-url https://webhook.site/your-id \
  --webhook-secret "Bearer token"

Via Configuration:

# config/settings.yaml
webhook:
  url: "https://your-webhook.com/endpoint"
  enabled: true
  headers:
    Authorization: "Bearer your-token"

๐Ÿ›ก๏ธ Sandboxed & Restricted Environments

EESA supports running in restricted environments (Docker, Lambda, etc.) by using environment variables to control where data is stored:

Environment Variable Description Default
EESA_DATA_DIR Base directory for all data ~/.email-archiver
EESA_CONFIG_PATH Path to settings.yaml data_dir/config/settings.yaml
EESA_DB_PATH Path to SQLite database data_dir/email_archiver.sqlite
EESA_LOG_FILE Path to log file (or stdout/stderr) data_dir/sync.log
EESA_AUTH_DIR Directory for OAuth tokens data_dir/auth
EESA_DOWNLOAD_DIR Default download directory data_dir/downloads
LLM_API_KEY API key for LLM provider -
LLM_BASE_URL Base URL for LLM API -
LLM_MODEL Model name to use gpt-4o-mini

Example (Lambda/Read-only FS):

# Store all data in /tmp
export EESA_DATA_DIR=/tmp
# Log directly to stdout
export EESA_LOG_FILE=stdout
# Run the archiver
email-archiver --provider gmail --incremental

๐Ÿ“‹ Command-Line Arguments

Argument Description
--provider {gmail,m365} Email provider (required)
--since YYYY-MM-DD Download emails since date
--incremental Resume from last checkpoint
--query STRING Custom search query
--webhook-url URL Webhook endpoint URL
--webhook-secret SECRET Authorization header for webhook
--download-dir PATH Custom download directory
--classify Enable AI email classification
--openai-api-key KEY OpenAI API key
--skip-promotional Skip promotional/social emails
--metadata-output PATH Path to save JSONL metadata
--llm-provider ID LLM provider (openai, ollama, etc.)
--llm-base-url URL Base URL for LLM API
--llm-api-key KEY API key for LLM provider
--llm-model NAME Model name (e.g., gpt-4o-mini, llama3)
--extract Enable advanced metadata extraction
--rename Intelligently rename .eml files to clean slugs
--embed Embed AI metadata directly into .eml headers
--ui Launch the Web Dashboard
--reset Factory Reset: Wipe all data (DB, logs, downloads)

See API Reference for complete documentation.

๐Ÿ”ง Requirements

  • Python 3.9+
  • Gmail API credentials (for Gmail)
  • Azure AD app registration (for M365)

๐Ÿ“ Project Structure

email-archiver/
โ”œโ”€โ”€ email_archiver/         # Main package
โ”‚   โ”œโ”€โ”€ main.py            # CLI entry point
โ”‚   โ””โ”€โ”€ core/              # Core modules
โ”‚       โ”œโ”€โ”€ gmail_handler.py
โ”‚       โ”œโ”€โ”€ graph_handler.py
โ”‚       โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ settings.yaml      # Configuration file
โ”‚   โ”œโ”€โ”€ checkpoint.json    # Incremental sync state
โ”‚   โ””โ”€โ”€ client_secret.json # OAuth credentials (git-ignored)
โ”œโ”€โ”€ auth/                  # OAuth tokens (git-ignored)
โ”œโ”€โ”€ downloads/             # Downloaded .eml files
โ”œโ”€โ”€ docs/                  # Documentation
โ””โ”€โ”€ pyproject.toml         # Package configuration

๐Ÿ”’ Security

  • OAuth2 Only - No password storage
  • Read-Only Scopes - gmail.readonly and Mail.Read
  • Token Protection - Tokens stored with restricted permissions (chmod 600)
  • HTTPS Webhooks - Always use HTTPS for webhook endpoints

๐Ÿค Contributing

This project follows the specification in docs/SPECIFICATION.md.

๐Ÿ“„ License

See LICENSE file for details.

๐Ÿ†˜ Support

y For issues or questions:

  1. Check the documentation
  2. Review examples
  3. Check logs in sync.log
  4. Open an issue on GitHub

๐ŸŽ“ Examples

Automation with Cron

# Daily backup at 2 AM
0 2 * * * email-archiver --provider gmail --incremental

Python Integration

import subprocess

subprocess.run([
    "email-archiver",
    "--provider", "gmail",
    "--since", "2024-12-01"
])

Using uvx (no installation)

# Run directly without installing
uvx email-archiver --provider gmail --since 2024-12-01

# Works from any directory
uvx email-archiver --help

See EXAMPLES.md for 21 more examples!


๐Ÿ‘ฅ Author & Credits

Author: Trung Le
Team: RealTimeX.ai
Repository: https://github.com/therealtimex/email-archiver


Built with โค๏ธ for secure email archiving

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

email_archiver-1.4.4.tar.gz (291.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

email_archiver-1.4.4-py3-none-any.whl (74.8 kB view details)

Uploaded Python 3

File details

Details for the file email_archiver-1.4.4.tar.gz.

File metadata

  • Download URL: email_archiver-1.4.4.tar.gz
  • Upload date:
  • Size: 291.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for email_archiver-1.4.4.tar.gz
Algorithm Hash digest
SHA256 7c0a95e99154e61eca210bb72c5a2b6571ca60821d4c3873a9ad927c88ddfc36
MD5 d83fb8800971d4f8eb0ad11e4ba59c92
BLAKE2b-256 7435a7ae458079f53aaf160baf3ff316dc3031122c3235d9d29860c2e1ecf3c9

See more details on using hashes here.

File details

Details for the file email_archiver-1.4.4-py3-none-any.whl.

File metadata

  • Download URL: email_archiver-1.4.4-py3-none-any.whl
  • Upload date:
  • Size: 74.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for email_archiver-1.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 254e3c4b5886e61c2a68253151273e0732133c9879ab202ef16fda9d86d5b65f
MD5 39d5f91450acf3af757c0b76624306da
BLAKE2b-256 a893093eee46e435227a096c33cf64890599c4ef54bf5b65d53f3d1216cae8b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page