Skip to main content

A tool for preserving email in multiple preservation formats.

Project description

Mailbagit

A tool for creating and managing Mailbags, a package for preserving email in multiple formats. It contains an open specification for mailbags, as well as the mailbagit and mailbagit-gui tools for packaging email exports into mailbags.

mailbagit can be used to convert native email formats, such as PST, MSG, EML, and MBOX into PDF, HTML, WARC, and other formats and combines them into stable packages for preservation.

Installation

pip install mailbagit
  • To install PST dependancies: pip install mailbagit[pst]
  • To install mailbagit-gui: pip install mailbagit[gui]

Docker setup

You can also run mailbagit using a Docker image.

docker pull ualbanyarchives/mailbagit
wget https://raw.githubusercontent.com/UAlbanyArchives/mailbagit/main/docker-compose.yml
docker compose run mailbagit
mailbagit -v

Quick start

Examples:

MSG files to PDF, EML, and WARC

mailbagit path/to/messages -i msg --derivatives eml pdf warc --mailbag_name my_mailbag

MBOX to PDF and plain text

mailbagit path/to/mbox_dir -i mbox -d txt pdf-chrome -m my_mailbag -r

PST to PDF, MBOX, EML, and WARC

mailbagit path/to/export.pst -i pst -d mbox eml pdf warc -m my_mailbag

EML to PDF and WARC in another directory

mailbagit path/to/messages -i eml -d pdf warc -m /path/to/my_mailbag

See the documentation for more details on:

Arguments

The arguments listed below can be entered in the command line when using mailbagitor entered in mailbagit-gui fields

Mandatory Arguments

  • path:

A path to email to be packaged into a mailbag. This can be a single file or a directory containing a number of email exports.

  • -m --mailbag:

A new directory for the mailbag, such as /path/to/my_mailbag, or just my_mailbag to use the same location as the source email. Must be a valid directory or file name and must not already exist.

  • -i --input:

File format to use as input for a mailbag. Argument takes single input. e.g. -i imap or -i pst

  • -d --derivatives:

Specifies a single or list of derivative formats that mailbagit will create and package into the mailbag. Argument takes multiple inputs. e.g. -d eml pdf warc

Mailbagit Optional Arguments

  • -v --version

Reports the version number and exits.

  • -r --dry-run

Performs a test run that will not alter any files other than writing an error report. When this flag is used, mailbagit parses all the email it is provide and formats derivatives as much as it can without writing anything to disk. If there are any error or warnings, this will create an error report with an errors.csv listing all issues as well as a full stack trace in a .txt file.

  • -k --keep

Keeps the source files as-is and copies instead of moving them into a mailbag.

  • --css

Path to a CSS file to override the included CSS when creating PDF or HTML derivatives Argument takes single file path as input.

  • -c --compress

Compresses the mailbag as a ZIP, TAR, or TAR.GZ e.g. -c zip or -c tar.gz

  • -f, --companion_files

Allows for companion metadata files to be packaged alongside email export files. When this option is used, mailbagit will recursively include all the files in the directory provided into a mailbag.

Bagit-python arguments

Mailbagit also accepts most bagit-python arguments. Thus, you can provide arguments like --processes 2 or arguments to add metadata such as --source-organization University at Albany, SUNY

The only bag-python arguments that mailbagit does not support are -log, -quiet, -validate, -fast, and -completeness_only

If you would like to validate your mailbag, mailbagit comes with bagit-python installed. Thus, you can run:

bagit.py --validate /path/to/mailbag

Development setup

git clone git@github.com:UAlbanyArchives/mailbagit.git
cd mailbagit
git switch develop
pip install -e .

Development with docker

  • This runs the dev docker image with the code installed in editable mode. You can then make code changes and run them directly with mailbagit.

  • Assumes you have a directory with email data in ./sampleData. You can change this directory name in line 7 of docker-compose-dev.yml.

docker pull ualbanyarchives/mailbagit:dev
git clone git@github.com:UAlbanyArchives/mailbagit.git
cd mailbagit
git switch develop
docker-compose -f docker-compose-dev.yml run mailbagit
mailbagit -v

License

MIT

Kudos

This project was made possible by funding from the University of Illinois's Email Archives: Building Capacity and Community Project.

We owe a lot to the hard work that goes towards developing and maintaining the libraries mailbagit uses to parse email formats and make bags. We'd like to thank these awesome projects, without which mailbagit wouldn't be possible:

We'd also like to thank the RATOM project whose documentation was super helpful in guiding us though some roadblocks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mailbagit-0.7.3.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

mailbagit-0.7.3-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file mailbagit-0.7.3.tar.gz.

File metadata

  • Download URL: mailbagit-0.7.3.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mailbagit-0.7.3.tar.gz
Algorithm Hash digest
SHA256 340669f0e306974e9c340dce73a3115b05bb95c65843d516da97e379a3e4e740
MD5 ce8af2ab62a134e93dde71b467d6fdcb
BLAKE2b-256 dd0cedf72765a58ee79cfca4165e9d39cce581c79f23748330bb47d0409a993f

See more details on using hashes here.

File details

Details for the file mailbagit-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: mailbagit-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mailbagit-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9b2b6d9e8bd431024f4ff939738b78d0f5808eaac84e4c2c9f7794bd506f0a25
MD5 4a81522088932d8684672bae2eb5682c
BLAKE2b-256 0ba5f5728f57322e2b6916bc573608ddc556a8a18277cf406cdef1de3091dbb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page