A streaming mime parser
Project description
mrmime, a fast and memory efficient streaming MIME parser
This isn't API stable or even really stable at all yet. A lot of features are missing. You shouldn't use this unless you're willing to actively help me with it.
Why?
email is a memory hog, very rigid and not particularly fast. I parse a lot of email
at work and I only need a couple of things:
- I want to control storage. I don't need large objects that represent the entire parsed message, I need specific fields.
- I want to control how I read up mime parts. I don't want massive strings.
- I don't want to load the entire file into memory.
- No serialization, only parsing.
- I want it to be fast.
- I want it to be intuitive.
Examples
Simple example showing how to use it:
from mrmime import BodyLineEvent, HeaderEvent, parse_file
with open("tests/data/simple.eml") as f:
for event in parse_file(f):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyLineEvent):
print("line from the body", event.line)
How to get the entire body in a single event:
from mrmime import HeaderEvent, BodyStreamer, body_streamer, parse_file
with open("tests/data/simple.eml") as f:
for event in body_streamer(parse_file(f)):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyStreamer):
print("body", event.read())
How to handle multipart messages:
from mrmime import ParserStateEvent, HeaderEvent, BodyLineEvent, multipart, parse_file
with open("tests/data/simple.eml") as f:
for event in multipart(parse_file(f)):
if isinstance(event, ParserStateEvent) and event.state is ParserState.Boundary:
print("new boundary started")
elif isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyLineEvent:
print("body", event.read())
How to handle messages from something other than a file:
from mrmime import BodyStreamer, HeaderEvent, Parser
parser = Parser()
for chunk in get_data_from_source(): # e.g. an async library or something
for event in parser.feed(chunk):
if isinstance(event, HeaderEvent):
print("header", event.key, event.value)
elif isinstance(event, BodyStreamer):
print("body", event.read())
TODO
- Think about recursive parsing, e.g. what if I want to parse messages in messages? What if I want to decide dynamically, rather than prior?
- MimePart should be decoding the data inside, but have the option to not do that
- Think more about the state transitions, they're messy
- we return bytes for everything at the moment, we shouldn't. We could make the Header object do the decoding so that it's lazy, that's a good idea.
- Can we use memoryviews at all for the headers?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mrmime-0.0.1.tar.gz.
File metadata
- Download URL: mrmime-0.0.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d1dd7424e65237ca0fd3158048812e282db4b7619e6b1d4f6ac054e7d261a99
|
|
| MD5 |
8c517fc5e10d2bc457c9f62bceeedbe2
|
|
| BLAKE2b-256 |
a551df1a05b7caae776acb99c3c50ca8303931fe9892823e2e7e0debf215af7a
|
File details
Details for the file mrmime-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mrmime-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.9.6 Linux/5.12.9-1-ARCH
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d9adc387a4e9305aee60a77eacf9c3e79e9d50a164b093e2495842d3ef67c90
|
|
| MD5 |
11037302143e933d51859586149a4d4b
|
|
| BLAKE2b-256 |
6ccb4f6549f900be7d56f52b28903415891d3e4c81ef156f73544d9085f002ad
|