Skip to main content

Mailgun library to extract message quotations and signatures.

Project description

Mailgun library to extract message quotations and signatures.

If you ever tried to parse message quotations or signatures you know that absense of any formatting standards in this area could make this task a nightmare. Hopefully this library will make your life much easier. The name of the project is inspired by TALON - multipurpose robot designed to perform missions ranging from reconnaissance to combat and operate in a number of hostile environments. That’s what a good quotations and signature parser should be like :smile:

Usage

Here’s how you initialize the library and extract a reply from a text message:

import talon
from talon import quotations

talon.init()

text =  """Reply

-----Original Message-----

Quote"""

reply = quotations.extract_from(text, 'text/plain')
reply = quotations.extract_from_plain(text)
# reply == "Reply"

To extract a reply from html:

html = """Reply
<blockquote>

  <div>
    On 11-Apr-2011, at 6:54 PM, Bob &lt;bob@example.com&gt; wrote:
  </div>

  <div>
    Quote
  </div>

</blockquote>"""

reply = quotations.extract_from(html, 'text/html')
reply = quotations.extract_from_html(html)
# reply == "<html><body><p>Reply</p></body></html>"

Often the best way is the easiest one. Here’s how you can extract signature from email message without any machine learning fancy stuff:

from talon.signature.bruteforce import extract_signature


message = """Wow. Awesome!
--
Bob Smith"""

text, signature = extract_signature(message)
# text == "Wow. Awesome!"
# signature == "--\nBob Smith"

Quick and works like a charm 90% of the time. For other 10% you can use the power of machine learning algorithms:

from talon import signature


message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.

John Doe
via mobile"""

text, signature = signature.extract(message, sender='john.doe@example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"

For machine learning talon currently uses PyML library to build SVM classifiers. The core of machine learning algorithm lays in talon.signature.learning package. It defines a set of features to apply to a message (featurespace.py), how data sets are built (dataset.py), classifier’s interface (classifier.py).

The data used for training is taken from our personal email conversations and from ENRON dataset. As a result of applying our set of features to the dataset we provide files classifier and train.data that don’t have any personal information but could be used to load trained classifier. Those files should be regenerated every time the feature/data set is changed.

Project details


Release history Release notifications

History Node

1.4.4

History Node

1.4.3

History Node

1.4.2

History Node

1.4.1

History Node

1.4.0

History Node

1.3.7

History Node

1.3.6

History Node

1.3.5

History Node

1.3.4

History Node

1.3.3

History Node

1.3.2

History Node

1.3.1

History Node

1.3.0

History Node

1.2.16

History Node

1.2.15

History Node

1.2.14

History Node

1.2.12

History Node

1.2.11

History Node

1.2.10

History Node

1.2.9

History Node

1.2.8

History Node

1.2.7

History Node

1.2.6

History Node

1.2.5

History Node

1.2.4

History Node

1.2.3

History Node

1.2.2

History Node

1.2.1

History Node

1.2.0

History Node

1.1.0

History Node

1.0.9

History Node

1.0.8

History Node

1.0.7

History Node

1.0.6

History Node

1.0.5

History Node

1.0.4

History Node

1.0.3

This version
History Node

1.0.2

History Node

1.0.1

History Node

1.0

History Node

0.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
talon-1.0.2.tar.gz (54.2 kB) Copy SHA256 hash SHA256 Source None Jul 24, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page