Skip to main content

A Python CLI tool for analyzing email data in mbox format

Project description

SWECC Email Scraper

A Python CLI tool for analyzing email data in mbox format. This tool helps you extract insights and perform analysis on email archives.

Features

  • 📧 Process mbox format email archives
  • 📊 Extendable framework for building analysis pipelines
  • 🎨 Rich command-line interface with progress reporting
  • Coming soon: Actual analysis...

Installation

From PyPI

pip install swecc-email-scraper

From Source

git clone https://github.com/swecc/email-scraper.git
cd email-scraper
pip install -e ".[dev]"  # Install with development dependencies

# Run tests
pytest

Quick Start

  1. Basic usage with default statistics processor:
swecc-email-scraper process path/to/mailbox.mbox
  1. Use multiple processors and specify output format:
swecc-email-scraper process path/to/mailbox.mbox -p statistics -p headers -f json -o results.json
  1. List available processors:
swecc-email-scraper list-processors
  1. List available output formats:
swecc-email-scraper list-formats

Basic Example Usage

  1. Basic email statistics:
swecc-email-scraper process inbox.mbox
  1. Export analysis to a file:
swecc-email-scraper process inbox.mbox -o analysis.json
  1. Use multiple processors:
swecc-email-scraper process inbox.mbox -p statistics -p <processor_name>

Extending the Tool

The tool is designed to be easily extensible. See CONTRIBUTING.md for detailed information on:

  • Creating custom processors
  • Adding new output formats
  • Contributing to the project
  • Development setup and guidelines

Architecture

The tool uses a pipeline architecture where:

  1. EmailData objects represent individual emails with parsed metadata
  2. Pipeline manages the flow of data through processors
  3. EmailProcessors transform or analyze the data
  4. OutputFormatters convert results to different formats

License

MIT License - See LICENSE file for details.

Acknowledgments

Developed as part of SWECC Labs at the University of Washington.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swecc_email_scraper-0.1.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swecc_email_scraper-0.1.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file swecc_email_scraper-0.1.1.tar.gz.

File metadata

  • Download URL: swecc_email_scraper-0.1.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for swecc_email_scraper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e34041615ebdc963b7e4ccee271542ee793328fcd5f546b1a448bc3a0d9ebf6c
MD5 8ae6abd243c4c719f876a6626ae01965
BLAKE2b-256 78578b024ff3ee92aafa7bc5c634ea68bef9c68fb80c98121431ca5d8a5d8f27

See more details on using hashes here.

File details

Details for the file swecc_email_scraper-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for swecc_email_scraper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c98a299769cc44d9e1707197f5797c44087f2009b3f5e9003abe785098104973
MD5 947ea7a710d5556902fc80f556161a2f
BLAKE2b-256 af5abdecf476edd04ed5c4cb8080d3f3a7fd3ecc42c3c2a3507226de9b3e1929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page