Email attachment processor with IMAP support
Project description
๐ฆ Email Attachment Processor
(YAML + keyring + per-day UID storage + password management + modular architecture)
Email Processor is a reliable, idempotent, and secure tool for automatic IMAP email processing:
- downloads attachments
- organizes them into folders based on subject
- archives processed emails
- stores processed email UIDs in separate files by date
- uses keyring for secure password storage
- supports new command:
--clear-passwords - progress bar for long-running operations
- file extension filtering (whitelist/blacklist)
- disk space checking before downloads
- structured logging with file output
- dry-run mode for testing
๐ Key Features
๐ Secure IMAP Password Management
- Password is not stored in code or YAML
- Saved in system storage (Windows Credential Manager, macOS Keychain, Linux SecretService)
- On first run, the script will prompt for password and offer to save it
โ๏ธ Configuration via config.yaml
- Download folder management
- Subject-based sorting rules (
topic_mapping) - Allowed sender management
- Archive settings
- Behavior options ("process / skip / archive")
- File extension filtering (whitelist/blacklist)
- Progress bar control
- Structured logging configuration
โก Fast Two-Phase IMAP Fetch
- Fast header fetch:
FROM SUBJECT DATE UID - Full email (
RFC822) is loaded only if it matches the logic
๐ Optimized Processed Email Storage
Each email's UID is saved in:
processed_uids/YYYY-MM-DD.txt
This ensures:
- ๐ฅ fast lookup of already processed UIDs
- โก minimal memory usage
- ๐ no duplicate downloads
- ๐ convenient rotation of old records
๐ฏ Usage
Running the Processor
Normal Mode
python -m email_processor
# or after installation:
email-processor
Custom Configuration File
python -m email_processor --config /path/to/custom_config.yaml
Note: By default, the processor uses config.yaml in the current directory. Use --config to specify a different configuration file path.
Dry-Run Mode (Test without downloading)
python -m email_processor --dry-run
Note: In dry-run mode, the processor connects to the IMAP server to retrieve and analyze the email list (to display statistics), but files are not downloaded and emails are not archived.
Dry-Run Mode with Mock Server (No connection)
python -m email_processor --dry-run-no-connect
Note: The --dry-run-no-connect mode uses a mocked IMAP server with test data. It does not require a real mail server connection or a password. It is useful for testing configuration without server access. It uses 3 test emails:
- Email from
client1@example.comwith subject "Roadmap Q1 2024" and attachmentroadmap.pdf - Email from
finance@example.comwith subject "Invoice #12345" and attachmentinvoice.pdf - Email from
spam@example.comwith subject "Spam Subject" and attachmentspam.exe(will be skipped if the sender is not in the allowed list)
Show Version
python -m email_processor --version
Clear Saved Passwords
python -m email_processor --clear-passwords
Create Default Configuration
python -m email_processor --create-config
Note: This command creates a default config.yaml file from config.yaml.example. If the file already exists, you'll be prompted to confirm overwriting it. You can combine it with --config to specify a custom path:
python -m email_processor --create-config --config /path/to/custom_config.yaml
โจ Password Management Command
This command:
โ removes saved password from keyring
โ allows setting a new password on next run
โ useful when:
- IMAP password expired / was changed
- switching to a different email account
- need to reset authorization without accessing Credential Manager
๐ง How --clear-passwords Works
- Script reads
imap.userfromconfig.yaml - Requests confirmation:
Do you really want to delete saved passwords? [y/N]:
- If user answers
y:
- password
email-vkh-processor / <user>is removed from keyring
- Script outputs report:
Done. Deleted entries: 1
- On next normal mode run, the script will prompt for a new password.
โก Implementation Benefits
โก Time Savings
Duplicate emails are skipped instantly.
โก Reduced IMAP Server Load
Minimal IMAP operations, partial fetch.
โก No Duplicate Attachment Downloads
Each attachment is downloaded only once.
โก No File Duplicates
Automatic numbering is used: file_01.pdf, file_02.pdf.
โก Absolute Idempotency
Can be run 20 times in a row โ result doesn't change.
โก Scalability
Per-day UID files ensure high performance.
โ Example config.yaml
imap:
server: "imap.example.com"
user: "your_email@example.com"
max_retries: 5
retry_delay: 3
processing:
start_days_back: 5
archive_folder: "INBOX/Processed"
processed_dir: "C:\\Users\\YourName\\AppData\\EmailProcessor\\processed_uids"
keep_processed_days: 180
archive_only_mapped: true
skip_non_allowed_as_processed: true
skip_unmapped_as_processed: true
show_progress: true # Show progress bar during processing
# Extension filtering (optional):
# allowed_extensions: [".pdf", ".doc", ".docx", ".xls", ".xlsx", ".zip", ".txt"]
# blocked_extensions: [".exe", ".bat", ".sh", ".scr", ".vbs", ".js"]
# Logging settings
logging:
level: INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
format: console # "console" (readable) or "json" (structured)
format_file: json # Format for file logs (default: "json")
file: logs # Optional: Directory for log files (rotated daily)
allowed_senders:
- "client1@example.com"
- "finance@example.com"
- "boss@example.com"
topic_mapping:
".*Roadmap.*": "roadmap"
"(Report).*": "reports"
"(Invoice|Bill).*": "invoices"
".*": "default" # Last rule is used as default for unmatched emails
Note:
-
All paths in
topic_mappingcan be either absolute or relative:- Absolute paths:
"C:\\Documents\\Roadmaps"(Windows) or"/home/user/documents/reports"(Linux/macOS) - Relative paths:
"roadmap"(relative to the script's working directory)
- Absolute paths:
-
The last rule in
topic_mappingis used as default for all emails that don't match any of the previous patterns -
Both absolute and relative paths are supported for
processed_dir:- Absolute paths:
"C:\\Users\\AppData\\processed_uids"(Windows) or"/home/user/.cache/processed_uids"(Linux/macOS) - Relative paths:
"processed_uids"(relative to the script's working directory)
Example with mixed paths:
topic_mapping: ".*Roadmap.*": "C:\\Documents\\Roadmaps" # Absolute path "(Report).*": "reports" # Relative path "(Invoice|Bill).*": "C:\\Finance\\Invoices" # Absolute path ".*": "default" # Default folder (relative path)
- Absolute paths:
๐ Password Management (Complete Command Set)
โ Save Password (automatically)
python -m email_processor
๐ Read Password
import keyring
keyring.get_password("email-vkh-processor", "your_email@example.com")
๐๏ธ Delete Password
python -m email_processor --clear-passwords
โ Add Password Manually
import keyring
keyring.set_password(
"email-vkh-processor",
"your_email@example.com",
"MY_PASSWORD"
)
๐ Installation
- Install dependencies:
pip install -r requirements.txt
- Copy configuration template:
cp config.yaml.example config.yaml
-
Edit
config.yamlwith your IMAP settings -
Run the script:
# As a module
python -m email_processor
# Or install and use as command
pip install -e .
email-processor
# To build distributable package for pip install, see BUILD.md
๐ ๏ธ Development Setup
For development, install additional tools:
pip install ruff mypy types-PyYAML
Code Quality Tools
-
Ruff: Fast linter and formatter (replaces Black)
ruff check . # Check for issues ruff check --fix . # Auto-fix issues ruff format . # Format code ruff format --check . # Check formatting
-
MyPy: Type checker
mypy email_processor # Type check
See CONTRIBUTING.md for detailed development guidelines.
๐ง Configuration Options
IMAP Settings
server: IMAP server address (required)user: Email address (required)max_retries: Maximum connection retry attempts (default: 5)retry_delay: Delay between retries in seconds (default: 3)
Processing Settings
start_days_back: How many days back to process emails (default: 5)archive_folder: IMAP folder for archived emails (default: "INBOX/Processed")processed_dir: Directory for processed UID files (default: "processed_uids")- Supports absolute paths:
"C:\\Users\\AppData\\processed_uids"or"/home/user/.cache/processed_uids" - Supports relative paths:
"processed_uids"(relative to script directory)
- Supports absolute paths:
keep_processed_days: Days to keep processed UID files (0 = keep forever, default: 0)archive_only_mapped: Archive only emails matching topic_mapping (default: true)skip_non_allowed_as_processed: Mark non-allowed senders as processed (default: true)skip_unmapped_as_processed: Mark unmapped emails as processed (default: true)show_progress: Show progress bar during processing (default: true, requires tqdm)allowed_extensions: List of allowed file extensions (e.g.,[".pdf", ".doc"])- If specified, only files with these extensions will be downloaded
- Case-insensitive, dot prefix optional
blocked_extensions: List of blocked file extensions (e.g.,[".exe", ".bat"])- Takes priority over
allowed_extensions - Files with these extensions will be skipped
- Case-insensitive, dot prefix optional
- Takes priority over
Logging Settings
level: Logging level - DEBUG, INFO, WARNING, ERROR, CRITICAL (default: "INFO")format: Console output format - "console" (readable) or "json" (structured, default: "console")format_file: File log format - "console" or "json" (default: "json")file: Directory for log files (optional, format:yyyy-mm-dd.log, rotated daily)- If not set, logs go to stdout only
Allowed Senders
List of email addresses allowed to process. If empty, no emails will be processed.
Topic Mapping
Dictionary of regex patterns to folder paths. Emails matching a pattern will be saved to the corresponding folder.
- The last rule in
topic_mappingis used as default for all emails that don't match any of the previous patterns - All paths can be absolute (e.g.,
"C:\\Documents\\Roadmaps") or relative (e.g.,"roadmap") - Patterns are checked in order, and the first match is used
๐ ๏ธ Features & Improvements
v7.1 Features
- โ Modular architecture - Clean separation of concerns
- โ YAML configuration - Easy configuration management
- โ Keyring password storage - Secure credential management
- โ Per-day UID storage - Optimized performance
- โ Two-phase IMAP fetch - Efficient email processing
- โ
Password management command -
--clear-passwordsoption - โ Configuration validation - Validates config on startup
- โ Structured logging - JSON and console formats with file output
- โ Configurable logging levels - DEBUG, INFO, WARNING, ERROR, CRITICAL
- โ Enhanced error handling - Comprehensive error recovery
- โ Detailed processing statistics - File type statistics
- โ Progress bar - Visual progress indicator (tqdm)
- โ File extension filtering - Whitelist/blacklist support
- โ Disk space checking - Prevents out-of-space errors
- โ
Dry-run mode - Test without downloading (
--dry-run) - โ Type hints - Full type annotation support
- โ Path traversal protection - Security hardening
- โ Attachment size validation - Prevents oversized downloads
๐ Notes
- The script is idempotent: safe to run multiple times
- Processed UIDs are stored per day for optimal performance
- Passwords are securely stored in system keyring
- Configuration is validated on startup
- All errors are logged with appropriate detail levels
- Progress bar shows real-time statistics (processed, skipped, errors)
- File extension filtering helps prevent unwanted downloads
- Disk space is checked before each download (with 10MB buffer)
- Logs are automatically rotated daily when file logging is enabled
๐๏ธ Architecture
The project uses a modular architecture for better maintainability:
email_processor/
โโโ config/ # Configuration loading and validation
โโโ logging/ # Structured logging setup
โโโ imap/ # IMAP operations (client, auth, archive)
โโโ processor/ # Email processing logic
โโโ storage/ # UID storage and file management
โโโ utils/ # Utility functions (email, path, disk, etc.)
See ARCHITECTURE_PROPOSAL.md for detailed architecture documentation.
๐ Additional Documentation
- Testing Guide: See
README_TESTS.md - Building and Distribution: See
BUILD.md(how to build package forpip install)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file email_processor-7.1.6.tar.gz.
File metadata
- Download URL: email_processor-7.1.6.tar.gz
- Upload date:
- Size: 60.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f49b54302130a3059cfbc0c100c087405ad3fb90d42f168b1aae477089dfe75c
|
|
| MD5 |
a89b7de36aec5b318ab9619330b6761a
|
|
| BLAKE2b-256 |
b423fb245709b7e8d4e11d1b73c0cdac5fd4fa54ed767de443eec9c8b378b240
|
File details
Details for the file email_processor-7.1.6-py3-none-any.whl.
File metadata
- Download URL: email_processor-7.1.6-py3-none-any.whl
- Upload date:
- Size: 73.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47df992f4ced4deb5824569d5bc161c991008f9f70e31b156b647dc8ac8b94a4
|
|
| MD5 |
0aa7f907bfa8f1b16fdb078e20e71e29
|
|
| BLAKE2b-256 |
20265115a44bc9892c019227d4f58c95e1ce843d1f24bface0f7b625a5db700f
|