Library to extract message quotations and signatures.
If you ever tried to parse message quotations or signatures you know that absence of any formatting standards in this area could make this task a nightmare. Hopefully this library will make your life much easier.
pip install claw
Here’s how you initialize the library and extract a reply from a text message:
import claw from claw import quotations claw.init() text = """Reply -----Original Message----- Quote""" reply = quotations.extract_from(text, 'text/plain') reply = quotations.extract_from_plain(text) # reply == "Reply"
To extract a reply from html:
html = """Reply <blockquote> <div> On 11-Apr-2011, at 6:54 PM, Bob <firstname.lastname@example.org> wrote: </div> <div> Quote </div> </blockquote>""" reply = quotations.extract_from(html, 'text/html') reply = quotations.extract_from_html(html) # reply == "<html><body><p>Reply</p></body></html>"
Often the best way is the easiest one. Here’s how you can extract signature from email message without any machine learning fancy stuff:
from claw.signature import extract_signature message = """Wow. Awesome! -- Bob Smith""" text, signature = extract_signature(message) # text == "Wow. Awesome!" # signature == "--\nBob Smith"
Quick and works like a charm 90% of the time. For other 10% you can use the power of machine learning algorithms. See the original talon implementation.
virtualenv venv source venv/bin/activate make install make test
Release new version:
Bump the version in setup.py and update CHANGELOG.md, and then: