Skip to main content

A CLI tool to archive old Gmail messages to local mbox files with validation and safe deletion

Project description

GMailArchiver-logo

GMailArchiver

PyPI version Version Python 3.14+ License Tests Coverage

A professional-grade email archival, search, and management solution for Gmail - Archive, compress, search, extract, and maintain your email history with confidence.

🎉 What's New in v1.4.5 - Performance Fix

Version 1.4.5 includes v1.4.3's complete fix for the O(n²) performance bottleneck plus CI/publishing fixes:

  • 500-1000x Faster - Batch archiving with O(n) complexity (was O(n²))
  • 🔧 Root Cause Fixed - Single mbox open/close cycle per batch (not per-message)
  • 📊 Progress Tracking - Real-time callbacks during batch operations
  • ⏸️ Graceful Interrupts - Ctrl+C saves progress for resumable operations
  • 1569 Tests Passing - All tests pass including CI environment

Recent Major Features

  • 🎨 Unified Rich Output - Beautiful terminal output with progress bars, ETA, and rate tracking
  • 📤 Message Extraction - Retrieve messages from search results
  • 🏥 Health Check Command - One command to check everything
  • Automated Scheduling - Set up periodic health checks
  • 🗜️ Post-hoc Compression - Compress existing archives anytime
  • 🩺 Doctor Command - Comprehensive diagnostics
  • 🔍 Search Enhancements - Preview and interactive modes
  • 📊 JSON Mode - All commands support --json for scripting

See full changelog

Why Gmail Archiver?

Gmail offers 15GB of free storage shared across Google services, but that space fills up quickly with years of emails, attachments, and files. While Gmail provides basic search and labels, it lacks:

  • Local backup and control: Your emails are only in Google's cloud
  • Long-term archival: No built-in way to archive and compress old emails while keeping them searchable
  • Data portability: Difficult to export and search emails outside Gmail
  • Storage optimization: No automatic compression or deduplication
  • Fast local search: Gmail search can be slow for large mailboxes

Gmail Archiver solves these problems by providing a professional-grade archival solution that:

  1. Archives old emails to portable mbox files (industry standard format)
  2. Searches archived emails with Gmail-style syntax (faster than Gmail itself!)
  3. Extracts individual messages from compressed archives
  4. Compresses archives with modern algorithms (zstd, lzma, gzip)
  5. Validates archives before deletion with multi-layer verification
  6. Manages your email history with deduplication and consolidation
  7. Automates maintenance with scheduled health checks
  8. Protects your data with atomic transactions and safe deletion workflows

Key Benefits

  • Reclaim Gmail storage: Archive old emails and safely delete them from Gmail
  • Keep emails searchable: Lightning-fast full-text search (0.85ms for 1000 messages)
  • Extract on demand: Retrieve individual messages from compressed archives
  • Maintain data sovereignty: Your emails, your local storage, your control
  • Automate maintenance: Set-and-forget health checks
  • Future-proof format: mbox is a 40+ year old standard supported by all email clients
  • Production-ready: 989 automated tests, 93% code coverage, strict type safety

✨ Core Features

📥 Archiving & Deletion

  • Smart Archiving: Archive emails older than a specified threshold (e.g., "3y", "6m", "30d")
  • Incremental Mode: Skip already-archived messages for efficient recurring runs
  • Safe Deletion Workflow: Archive-only (default) → Trash (reversible) → Permanent (confirmed)
  • Batch Operations: Efficient API usage with automatic rate limiting
  • Progress Tracking: Real-time progress bars with ETA and processing rate

🔍 Search & Retrieval

  • Full-Text Search (FTS5): Gmail-style query syntax with BM25 ranking
  • Lightning Fast: 0.85ms for 1000 messages (118x faster than target)
  • Message Extraction: Extract individual messages by ID or from search results
  • Interactive Search: Browse search results with a menu interface
  • Preview Mode: See message snippets in search results
  • JSON Output: All commands support --json for scripting

🗜️ Compression & Storage

  • Modern Compression: zstd (fastest), lzma (smallest), gzip (compatible)
  • Post-hoc Compression: Compress existing archives anytime
  • Transparent Decompression: Read compressed archives without extraction
  • Smart Deduplication: Remove duplicates across archives (100% precision)
  • Archive Consolidation: Merge multiple archives with automatic sorting

🛡️ Safety & Validation

  • Multi-Layer Validation: Message count, database cross-check, content integrity, spot-checks
  • Unified Health Check: One command checks database, archives, auth, performance
  • Auto-Repair: Automatic database repair with rollback support
  • Atomic Operations: All writes are transactional (succeed or rollback)
  • Audit Trail: Complete history of all operations

⚙️ Automation & Maintenance

  • Scheduled Health Checks: Platform-native cron/Task Scheduler integration
  • Automatic Migration: v1.0 → v1.1 → v1.2 schema upgrades with backup
  • Comprehensive Diagnostics: Doctor command analyzes everything
  • Auto-Verification: Optional validation after import/consolidate/dedupe
  • Performance Metrics: Track search latency, database size, vacuum status

📦 Installation

Prerequisites

  • Python 3.14+ (Download here)
  • Gmail Account with email you want to archive

Note: OAuth2 credentials are bundled with the application. No manual Google Cloud setup required!

Install from PyPI (Recommended)

pip install gmail-archiver-cli

Or use pipx for isolated installation:

pipx install gmail-archiver-cli

Verify Installation

gmailarchiver --version
gmailarchiver --help

🚀 Quick Start

First Run - OAuth2 Authorization

On first run, Gmail Archiver will automatically:

  1. Open your browser to Google's authorization page
  2. Ask you to sign in with your Google Account
  3. Request permission to access Gmail
  4. Save an authorization token to ~/.config/gmailarchiver/token.json

Basic Workflow

# 1. Preview what will be archived (dry run)
gmailarchiver archive 3y --dry-run

# 2. Archive emails older than 3 years with compression
gmailarchiver archive 3y --compress zstd
# → Creates: archive_20250123.mbox.zst

# Or use exact dates (v1.3.0+)
gmailarchiver archive 2024-01-01 --compress zstd
# → Archives all emails before January 1, 2024

# 3. Validate the archive
gmailarchiver validate archive_20250123.mbox.zst

# 4. Search your archives
gmailarchiver search "from:alice@example.com meeting"

# 5. Extract a message
gmailarchiver extract msg_123abc --output message.eml

# 6. Check overall health
gmailarchiver check

# 7. (Optional) Delete from Gmail after verification
gmailarchiver archive 3y --trash  # Reversible (30 days)

📖 Command Reference

For complete command documentation with all options, see docs/USAGE.md.

Quick Reference

Category Commands
Archiving archive, import, consolidate, compress
Search search, extract
Health check, doctor, verify-integrity, repair
Maintenance dedupe, status, schedule
Auth auth-reset, migrate, rollback

Key Commands

# Archive emails older than 3 years with compression
gmailarchiver archive 3y --compress zstd

# Search archived messages
gmailarchiver search "from:alice@example.com subject:meeting"

# Extract a specific message
gmailarchiver extract msg_123abc --output message.eml

# Run all health checks
gmailarchiver check

# Show status with database info
gmailarchiver status --verbose

# Full diagnostics
gmailarchiver doctor

All commands support --json for scripting and --help for detailed options.

📊 Performance

Operation Dataset Time Rate
Search (metadata) 1,000 messages 0.85ms 1.2M msg/s
Search (full-text) 1,000 messages 45ms 22K msg/s
Import 10,000 messages <1s 10K+ msg/s
Consolidate 10,000 messages 3.57s 2.8K msg/s
Extract Single message <10ms N/A

🔒 Security & Privacy

  • OAuth2 Flow: Industry-standard authentication
  • Scopes: Minimum required permissions (gmail.modify for deletion)
  • Token Storage: XDG-compliant paths (~/.config/gmailarchiver/)
  • Local Storage: All data stored locally, no cloud dependencies
  • Audit Trail: Complete operation history in database
  • Safe Deletion: Trash-first workflow with 30-day recovery window

📚 Additional Documentation

📜 Version History

v1.4.5 (2025-12-05) - Performance Fix + CI/Publishing Repairs

Complete Fix for O(n²) Performance Bottleneck (from v1.4.3):

  • 500-1000x faster for large archives (O(n) complexity instead of O(n²))
  • Single mbox open/close cycle per batch (not per-message)
  • Removed deprecated archive_message() method to prevent future misuse
  • Progress callbacks for real-time tracking during batch operations
  • Graceful interrupt handling (Ctrl+C saves progress for resumable operations)

CI/Publishing Fixes:

  • Configured PyPI Trusted Publishing for secure releases
  • Fixed doctor diagnostics test mock target
  • Fixed session logger cleanup file ordering

Quality: 1569 tests, 94% coverage (all passing in CI)

v1.4.2 (2025-12-01) - Performance & Architecture

Performance:

  • 2x faster archiving (batch_delay: 1.0s → 0.5s)
  • Optimized Gmail API batching for 10-15 msg/sec practical throughput

Architecture:

  • Completed facade pattern migration for all CLI commands
  • Removed 9 legacy module files
  • All tests updated to use facade APIs

Bug Fixes:

  • Fixed progress bars not updating during import and verify-integrity commands

Quality: 1570 tests, 94% coverage (+182 tests from v1.4.1)

v1.3.2 (2024-11-24) - Critical Bug Fix

Bug Fixes:

  • Fixed UNIQUE constraint failures during archiving (messages with duplicate RFC Message-IDs)
  • Improved duplicate detection to check rfc_message_id before writing to mbox
  • Eliminated orphaned messages in mbox files

Quality: 1072 tests, 93% coverage maintained

v1.3.1 (2025-11-24) - Live Layout Infrastructure

Internal Features:

  • Added LogBuffer, SessionLogger, and LiveLayoutContext for flicker-free progress tracking
  • Enhanced OutputManager with live layout support

Quality: 1071 tests, 93% coverage

v1.2.0 (2025-11-23) - Ergonomics & Automation

Major Features:

  • Unified OutputManager with Rich output and JSON mode
  • 5 new commands: extract, check, schedule, compress, doctor
  • Progress bars with ETA and rate tracking
  • Search enhancements (--with-preview, --interactive)
  • Auto-verification flags (--auto-verify)
  • Cleanup options (--remove-sources)

Test Coverage: 989 tests, 93% coverage

v1.1.0 (2025-11-15) - Search & Management

Major Features:

  • FTS5 full-text search with Gmail-style syntax
  • Import existing archives (10K+ msg/s)
  • Deduplication (100% precision)
  • Archive consolidation
  • Enhanced database schema (v1.1)

Test Coverage: 650 tests, 96% coverage

v1.0.x - Initial Releases

  • Core archiving functionality
  • Multi-layer validation
  • Safe deletion workflows
  • Compression support (gzip, lzma, zstd)

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for:

  • Development environment setup
  • Testing guidelines
  • Code quality standards
  • Pull request process

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with Python 3.14 and modern type checking
  • Uses Gmail API for email access
  • Rich library for beautiful terminal output
  • SQLite FTS5 for full-text search
  • Python mbox for email archive handling

📧 Support


Made with ❤️ for email power users who value privacy, control, and local data ownership.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmail_archiver_cli-1.4.5.tar.gz (510.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gmail_archiver_cli-1.4.5-py3-none-any.whl (203.5 kB view details)

Uploaded Python 3

File details

Details for the file gmail_archiver_cli-1.4.5.tar.gz.

File metadata

  • Download URL: gmail_archiver_cli-1.4.5.tar.gz
  • Upload date:
  • Size: 510.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gmail_archiver_cli-1.4.5.tar.gz
Algorithm Hash digest
SHA256 0c130fdce09c8475849a9fedba71e314f189a7488a250cd2b5acc82bac2d4970
MD5 7d309885d33f06c9fcff12ee80af9c37
BLAKE2b-256 756a8c78b925bbdc9da089c355deff6e10836e4e58ff81ee3a74de80c561a272

See more details on using hashes here.

Provenance

The following attestation bundles were made for gmail_archiver_cli-1.4.5.tar.gz:

Publisher: release-and-publish.yml on tumma72/GMailArchiver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gmail_archiver_cli-1.4.5-py3-none-any.whl.

File metadata

File hashes

Hashes for gmail_archiver_cli-1.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d9f43fb174e8f13901a8d429469bb7dcdf9d7a7097cf04110c6a59ee8384dbcd
MD5 5cea41ec65649806664e89846213199d
BLAKE2b-256 f98ea6c58bb17378762725267778a8504c2647108d6e58b8fe2ba1407ac24453

See more details on using hashes here.

Provenance

The following attestation bundles were made for gmail_archiver_cli-1.4.5-py3-none-any.whl:

Publisher: release-and-publish.yml on tumma72/GMailArchiver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page