Skip to main content

Extracts emails and attachments saved in Microsoft Outlook's .msg files

Project description

|License: GPL v3| |PyPI3| |PyPI1| |PyPI2|

msg-extractor
=============

Extracts emails and attachments saved in Microsoft Outlook’s .msg files

The python package extract_msg automates the extraction of key email
data (from, to, cc, date, subject, body) and the email’s attachments.

- `Changelog <CHANGELOG.md>`__

Usage
-----

**To use it as a command-line script**:

::

python extract_msg example.msg

This will produce a new folder named according to the date, time and
subject of the message (for example “2013-07-24_0915 Example”). The
email itself can be found inside the new folder along with the
attachments.

The script uses Philippe Lagadec’s Python module that reads Microsoft
OLE2 files (also called Structured Storage, Compound File Binary Format
or Compound Document File Format). This is the underlying format of
Outlook’s .msg files. This library currently supports up to Python 2.7
and 3.4.

The script was built using Peter Fiskerstrand’s documentation of the
.msg format. Redemption’s discussion of the different property types
used within Extended MAPI was also useful. For future reference, I note
that Microsoft have opened up their documentation of the file format.


#########REWRITE COMMAND LINE USAGE#############
Currently, the README is in the process of being redone. For now, please
refer to the usage information provided from the program's help dialog:
::
usage: extract_msg [-h] [--use-content-id] [--dev] [--validate] [--json]
[--file-logging] [--verbose] [--log LOG]
[--config CONFIG_PATH] [--out OUT_PATH] [--use-filename]
msg [msg ...]

extract_msg: Extracts emails and attachments saved in Microsoft Outlook's .msg
files. https://github.com/mattgwwalker/msg-extractor

positional arguments:
msg An msg file to be parsed

optional arguments:
-h, --help show this help message and exit
--use-content-id, --cid
Save attachments by their Content ID, if they have
one. Useful when working with the HTML body.
--dev Changes to use developer mode. Automatically enables
the --verbose flag. Takes precedence over the
--validate flag.
--validate Turns on file validation mode. Turns off regular file
output.
--json Changes to write output files as json.
--file-logging Enables file logging. Implies --verbose
--verbose Turns on console logging.
--log LOG Set the path to write the file log to.
--config CONFIG_PATH Set the path to load the logging config from.
--out OUT_PATH Set the folder to use for the program output.
(Default: Current directory)
--use-filename Sets whether the name of each output is based on the
msg filename.

**To use this in your own script**, start by using:

::

import extract_msg

From there, initialize an instance of the Message class:

::

msg = extract_msg.Message("path/to/msg/file.msg")

Alternatively, if you wish to send a msg binary string instead of a file
to the ExtractMsg.Message Method:

::

msg_raw = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00 ... \x00x00x00'
msg = extract_msg.Message(msg_raw)

If you want to override the default attachment class and use one of your
own, simply change the code to:

::

msg = extract_msg.Message("path/to/msg/file.msg", attachmentClass = CustomAttachmentClass)

where ``CustomAttachmentClass`` is your custom class.

#TODO: Finish this section

If you have any questions feel free to contact me, Matthew Walker, at
mattgwwalker at gmail.com. NOTE: Due to time constraints, The Elemental
of Creation has been added as a contributor to help manage the project.
As such, it may be helpful to send emails to arceusthe@gmail.com as
well.

If you have issues, it would be best to get help for them by opening a
new github issue.

Error Reporting
---------------

Should you encounter an error that has not already been reported, please
do the following when reporting it: \* Make sure you are using the
latest version of extract_msg. \* State your Python version. \* Include
the code, if any, that you used. \* Include a copy of the traceback.

Installation
------------

You can install using pip:

- Pypi

.. code:: bash

pip install extract-msg

- Github

.. code:: sh

pip install git+https://github.com/mattgwwalker/msg-extractor

or you can include this in your list of python dependencies with:

.. code:: python

# setup.py

setup(
...
dependency_links=['https://github.com/mattgwwalker/msg-extractor/zipball/master'],
)

Todo
----

Here is a list of things that are currently on our todo list:

* Tests (ie. unittest)
* Finish writing a usage guide
* Improve the intelligence of the saving functions
* Provide a way to save attachments and messages into a custom location under a custom name
* Implement better property handling that will convert each type into a python equivalent if possible
* Implement handling of named properties
* Improve README
* Create a wiki for advanced usage information

Credits
-------

`Matthew Walker`_ - Original developer and owner

`Ken Peterson (The Elemental of Creation)`_ - Principle programmer, manager, and msg file "expert"

`JP Bourget`_ - Senior programmer, readability and organization expert, secondary manager

`Philippe Lagadec`_ - Python OleFile module developer

Joel Kaufman - First implementations of the json and filename flags

`Dean Malmgren`_ - First implementation of the setup.py script

.. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
:target: LICENSE.txt
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.23.0-blue.svg
:target: https://pypi.org/project/extract-msg/0.23.0/
.. |PyPI1| image:: https://img.shields.io/badge/python-2.7+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-2715/
.. |PyPI2| image:: https://img.shields.io/badge/python-3.6+-brightgreen.svg
:target: https://www.python.org/downloads/release/python-367/
.. _Matthew Walker: https://github.com/mattgwwalker
.. _Ken Peterson (The Elemental of Creation): https://github.com/TheElementalOfCreation
.. _JP Bourget: https://github.com/punkrokk
.. _Philippe Lagadec: https://github.com/decalage2
.. _Dean Malmgren: https://github.com/deanmalmgren


Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_msg-0.23.0.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

extract_msg-0.23.0-py2.py3-none-any.whl (46.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file extract_msg-0.23.0.tar.gz.

File metadata

  • Download URL: extract_msg-0.23.0.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for extract_msg-0.23.0.tar.gz
Algorithm Hash digest
SHA256 f0e14a76d3af1f077c69e21ad8e3840a09626e8337435d9e69e4ec63aae76f2d
MD5 f1a8bc8b0354e0bdd25beafcab18fa9e
BLAKE2b-256 8b28d5748128d627c5c4eba3286a41bf8e198b877f4fcb5f62a888f4606cd87c

See more details on using hashes here.

File details

Details for the file extract_msg-0.23.0-py2.py3-none-any.whl.

File metadata

  • Download URL: extract_msg-0.23.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for extract_msg-0.23.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4ab0ee33679bf6333aed193cf4089f9ed3e5ec04a36883f4a56b265508ae10a8
MD5 89c4a377d25d53d46a87978afccc2124
BLAKE2b-256 df2be3ab7e70c7ecbcc3556b8da149128350098797a62347ce94a68cbd89b234

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page