Skip to main content

A streaming mime parser

Project description

mrmime, a fast and memory efficient streaming MIME parser

This isn't API stable or even really stable at all yet. A lot of features are missing. You shouldn't use this unless you're willing to actively help me with it.

Why?

email is a memory hog, very rigid and not particularly fast. I parse a lot of email at work and I only need a couple of things:

  • I want to control storage. I don't need large objects that represent the entire parsed message, I need specific fields.
  • I want to control how I read up mime parts. I don't want massive strings.
  • I don't want to load the entire file into memory.
  • No serialization, only parsing.
  • I want it to be fast.
  • I want it to be intuitive.

Examples

Simple example showing how to use it:

from mrmime import BodyLineEvent, HeaderEvent, parse_file


with open("tests/data/simple.eml") as f:
    for event in parse_file(f):
        if isinstance(event, HeaderEvent):
            print("header", event.key, event.value)
        elif isinstance(event, BodyLineEvent):
            print("line from the body", event.line)

How to get the entire body in a single event:

from mrmime import HeaderEvent, BodyStreamer, body_streamer, parse_file

with open("tests/data/simple.eml") as f:
    for event in body_streamer(parse_file(f)):
        if isinstance(event, HeaderEvent):
            print("header", event.key, event.value)
        elif isinstance(event, BodyStreamer):
            print("body", event.read())

How to handle multipart messages:

from mrmime import ParserStateEvent, HeaderEvent, BodyLineEvent, multipart, parse_file

with open("tests/data/simple.eml") as f:
    for event in multipart(parse_file(f)):
        if isinstance(event, ParserStateEvent) and event.state is ParserState.Boundary:
            print("new boundary started")
        elif isinstance(event, HeaderEvent):
            print("header", event.key, event.value)
        elif isinstance(event, BodyLineEvent:
            print("body", event.read())

How to handle messages from something other than a file:

from mrmime import BodyStreamer, HeaderEvent, Parser

parser = Parser()

for chunk in get_data_from_source():  # e.g. an async library or something
    for event in parser.feed(chunk):
        if isinstance(event, HeaderEvent):
            print("header", event.key, event.value)
        elif isinstance(event, BodyStreamer):
            print("body", event.read())

TODO

  • Think about recursive parsing, e.g. what if I want to parse messages in messages? What if I want to decide dynamically, rather than prior?
  • MimePart should be decoding the data inside, but have the option to not do that
  • Think more about the state transitions, they're messy
  • we return bytes for everything at the moment, we shouldn't. We could make the Header object do the decoding so that it's lazy, that's a good idea.
  • Can we use memoryviews at all for the headers?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrmime-0.0.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

mrmime-0.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file mrmime-0.0.1.tar.gz.

File metadata

  • Download URL: mrmime-0.0.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH

File hashes

Hashes for mrmime-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9d1dd7424e65237ca0fd3158048812e282db4b7619e6b1d4f6ac054e7d261a99
MD5 8c517fc5e10d2bc457c9f62bceeedbe2
BLAKE2b-256 a551df1a05b7caae776acb99c3c50ca8303931fe9892823e2e7e0debf215af7a

See more details on using hashes here.

File details

Details for the file mrmime-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mrmime-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH

File hashes

Hashes for mrmime-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6d9adc387a4e9305aee60a77eacf9c3e79e9d50a164b093e2495842d3ef67c90
MD5 11037302143e933d51859586149a4d4b
BLAKE2b-256 6ccb4f6549f900be7d56f52b28903415891d3e4c81ef156f73544d9085f002ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page