Skip to main content

Python package for extracting metadata, text, html and attachements from email messages.

Project description

Welcome
=======

`emaildata <https://pypi.python.org/pypi/emaildata/>`__ is a python
package for extracting content from email messages. It is a fork of the
`emailcontent <https://pypi.python.org/pypi/emailcontent/>`__ package
but adds from features.

emaildata features
------------------

`emaildata <https://pypi.python.org/pypi/emaildata/>`__ extracts this
types of contents from emails:

- Extract metadata.
- Extract text (plain text and html).
- Extract attachments.

Extracting metadata
-------------------

This feature was included from the *metadata* module of the
`emailcontent <https://pypi.python.org/pypi/emailcontent/>`__. This
module was copied module, few methods of is *MetaData* class were
removed, and the module was made more pylint friendly.

To extract metadata from an email message headers you create an instance
of the *MetaData* class passing a message to the constructor. You can
retrieve the metadata with the method *to\_dict*. This can also be done
using the method *set\_message*::

import email
from emaildata.metadata import MetaData

message = email.message_from_file(open('message.eml'))
extractor = MetaData(message)
data = extractor.to_dict()
print data.keys()

message2 = email.message_from_file(open('message2.eml'))
extractor.set_message(message2)
data2 = extractor.to_dict()

Extracting text
---------------

The class `Text` in the `text` module have static methods for extracting
text and html from messages::

import email
from emaildata.text import Text

message = email.message_from_file(open('message.eml'))
text = Text.text(message)
html = Text.html(message)

Extracting attachments
-----------------------

The method :class:`Attachment.extract` returns an iterator with the decoded
contents of the attachments of a message::

import email
from emaildata.attachment import Attachment

message = email.message_from_file(open('message.eml'))
for content, filename, mimetype, message in Attachment.extract(message):
print filename
with open(filename, 'w') as stream:
stream.write(content)
# If message is not None then it is an instance of email.message.Message
if message:
print "The file {0} is a message with attachments.".format(filename)

By default this method only iterates by the attachments with a filename. To retrieve all
attachments you have to pass `False` as the second parameter (`only_with_filename`)::

import email
import mimetypes
import uuid
from emaildata.attachment import Attachment

message = email.message_from_file(open('message.eml'))
for content, filename, mimetype, message in Attachment.extract(message, False):
if not filename:
filename = str(uuid.uuid1()) + (mimetypes.guess_extension(mimetype) or '.txt')
print filename
with open(filename, 'w') as stream:
stream.write(content)
# If message is not None then it is an instance of email.message.Message
if message:
print "The file {0} is a message with attachments.".format(filename)

Changelog
---------

Version 0.2 (2015-05-3)
~~~~~~~~~~~~~~~~~~~~~~~~

- Implemented class for extracting attachments from messages.

Version 0.2 (2015-05-3)
~~~~~~~~~~~~~~~~~~~~~~~~

- Implemented class for extracting plain text and html from messages.

Version 0.1 (2015-03-15)
~~~~~~~~~~~~~~~~~~~~~~~~

- Initial version.
- Support for metadata extraction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emaildata-0.3.0.tar.gz (7.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page