A streaming mime parser
Project description
mrmime, a fast and memory efficient streaming MIME parser
This isn't API stable or even really stable at all yet. A lot of features are missing. You shouldn't use this unless you're willing to actively help me with it.
Why?
email
is a memory hog, very rigid and not particularly fast. I parse a lot of email
at work and I only need a couple of things:
- I want to control storage. I don't need large objects that represent the entire parsed message, I need specific fields.
- I want to control how I read up mime parts. I don't want massive strings.
- I don't want to load the entire file into memory.
- No serialization, only parsing.
- I want it to be fast.
- I want it to be intuitive.
Examples
Simple example showing how to use it:
from mrmime import BodyLineEvent, HeaderEvent, parse_file
with open("tests/data/simple.eml") as f:
for event in parse_file(f):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyLineEvent):
print("line from the body", event.line)
How to get the entire body in a single event:
from mrmime import HeaderEvent, BodyStreamer, body_streamer, parse_file
with open("tests/data/simple.eml") as f:
for event in body_streamer(parse_file(f)):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyStreamer):
print("body", event.read())
How to handle multipart messages:
from mrmime import ParserStateEvent, HeaderEvent, BodyLineEvent, multipart, parse_file
with open("tests/data/simple.eml") as f:
for event in multipart(parse_file(f)):
if isinstance(event, ParserStateEvent) and event.state is ParserState.Boundary:
print("new boundary started")
elif isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyLineEvent:
print("body", event.read())
How to handle messages from something other than a file:
from mrmime import BodyStreamer, HeaderEvent, Parser
parser = Parser()
for chunk in get_data_from_source(): # e.g. an async library or something
for event in parser.feed(chunk):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyStreamer):
print("body", event.read())
TODO
- Think about recursive parsing, e.g. what if I want to parse messages in messages? What if I want to decide dynamically, rather than prior?
- MimePart should be decoding the data inside, but have the option to not do that
- Think more about the state transitions, they're messy
- we return bytes for everything at the moment, we shouldn't. We could make the Header object do the decoding so that it's lazy, that's a good idea.
- Can we use memoryviews at all for the headers?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mrmime-0.0.1.tar.gz
(6.3 kB
view details)
Built Distribution
File details
Details for the file mrmime-0.0.1.tar.gz
.
File metadata
- Download URL: mrmime-0.0.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d1dd7424e65237ca0fd3158048812e282db4b7619e6b1d4f6ac054e7d261a99 |
|
MD5 | 8c517fc5e10d2bc457c9f62bceeedbe2 |
|
BLAKE2b-256 | a551df1a05b7caae776acb99c3c50ca8303931fe9892823e2e7e0debf215af7a |
File details
Details for the file mrmime-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: mrmime-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d9adc387a4e9305aee60a77eacf9c3e79e9d50a164b093e2495842d3ef67c90 |
|
MD5 | 11037302143e933d51859586149a4d4b |
|
BLAKE2b-256 | 6ccb4f6549f900be7d56f52b28903415891d3e4c81ef156f73544d9085f002ad |