A Python CLI tool for analyzing email data in mbox format
Project description
SWECC Email Scraper
A Python CLI tool for analyzing email data in mbox format. This tool helps you extract insights and perform analysis on email archives.
Features
- 📧 Process mbox format email archives
- 📊 Extendable framework for building analysis pipelines
- 🎨 Rich command-line interface with progress reporting
- Coming soon: Actual analysis...
Installation
From PyPI
pip install swecc-email-scraper
From Source
git clone https://github.com/swecc/email-scraper.git
cd email-scraper
pip install -e ".[dev]" # Install with development dependencies
# Run tests
pytest
Quick Start
- Basic usage with default statistics processor:
swecc-email-scraper process path/to/mailbox.mbox
- Use multiple processors and specify output format:
swecc-email-scraper process path/to/mailbox.mbox -p statistics -p headers -f json -o results.json
- List available processors:
swecc-email-scraper list-processors
- List available output formats:
swecc-email-scraper list-formats
Basic Example Usage
- Basic email statistics:
swecc-email-scraper process inbox.mbox
- Export analysis to a file:
swecc-email-scraper process inbox.mbox -o analysis.json
- Use multiple processors:
swecc-email-scraper process inbox.mbox -p statistics -p <processor_name>
Extending the Tool
The tool is designed to be easily extensible. See CONTRIBUTING.md for detailed information on:
- Creating custom processors
- Adding new output formats
- Contributing to the project
- Development setup and guidelines
Architecture
The tool uses a pipeline architecture where:
EmailDataobjects represent individual emails with parsed metadataPipelinemanages the flow of data through processorsEmailProcessors transform or analyze the dataOutputFormatters convert results to different formats
License
MIT License - See LICENSE file for details.
Acknowledgments
Developed as part of SWECC Labs at the University of Washington.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swecc_email_scraper-0.1.1.tar.gz.
File metadata
- Download URL: swecc_email_scraper-0.1.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e34041615ebdc963b7e4ccee271542ee793328fcd5f546b1a448bc3a0d9ebf6c
|
|
| MD5 |
8ae6abd243c4c719f876a6626ae01965
|
|
| BLAKE2b-256 |
78578b024ff3ee92aafa7bc5c634ea68bef9c68fb80c98121431ca5d8a5d8f27
|
File details
Details for the file swecc_email_scraper-0.1.1-py3-none-any.whl.
File metadata
- Download URL: swecc_email_scraper-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c98a299769cc44d9e1707197f5797c44087f2009b3f5e9003abe785098104973
|
|
| MD5 |
947ea7a710d5556902fc80f556161a2f
|
|
| BLAKE2b-256 |
af5abdecf476edd04ed5c4cb8080d3f3a7fd3ecc42c3c2a3507226de9b3e1929
|