A simple framework for automatise mail classification task
Project description
ClassMail
Classmail
Mail classification Python library optimized for french mails in the field of insurance. Classmail was created to automate mail classification workflow in quick experiments. Developped during my internship at Covéa.
Classmail provides:
-
Data visualisation: For quick data analysis, based on matplotlib and seaborn
-
Mails preprocessing (cleaning): Optimised for inasurrance purposes, with prebuilt regular expressions (in french). This configuration file can be adapted for other languages or purposes.
-
Deep learning model creation (for classification): Simple interface to build Pytorch models quickly based on Flair nlp library.
-
Model analysis and explainer Simple interface with prebuilt seaborn graphs and model explainer based on Lime.
Quick Start
Requirements and Installation
The project is based on Python 3.7+. If you do not have Python 3.6, install it first. Then, in your favorite virtual environment, simply do:
pip install classmail
Example Usage
Let's run named entity recognition (NER) over an example sentence. All you need to do is make a Sentence
, load
a pre-trained model and use it to predict tags for the sentence:
-
Data analysis
import classmail.data_visualisation.data_visualisation as dv # show class balancing graph dv.plot_class_balancing(data,col_text='header_body',col_label="COMPETENCE", title="Catégories des mails") #show most frequent bigrams dv.plot_word_frequencies(data['message'],ngram=2,words_nb=20) #plot a wordcloud with most frequent terms in body dv.plot_wordcloud(data['body'])
-
Cleaning
from classmail.nlp.cleaning import clean_mail #create a new column in data ("clean_text") with preprocessed header and body data = clean_mail(data,"body","header")
-
Model creation and training
from classmail.classification.trainer import Trainer trainer = Trainer() #generate train / test / val csv files trainer.prepare_data(data, col_text="clean_text",col_label="COMPETENCE", train_size=0.7, val_size=0.15, test_size=0.15) #create a new column in data ("clean_text") with preprocessed header and body data = clean_mail(data,"body","header") #train a model with default parameters trainer.train_model(model_name="default_model")
-
Model predictions, evaluation and explaination
from classmail.classification.model import Model #load our model, saved in "ressources" folder model = Model("ressources\\model_default") #predictions predictions=model.get_predictions(X_test) #confusion matrix model.plot_confusion_matrix(pred_labels=predictions, true_labels=y_test) #explain one exemple at index 110 model.visualize_one_ex(X_test,y_test,index=110,num_features=6) #compute most discriminants words in each category sorted_contributions = model.get_statistical_explanation(X_test, ["class 1","class 2","class 3"] sample_size=15) #plot them for first class model.plot_discriminant_words(sorted_contributions, "class 1", nb_words=15)
Tutorial
Here is a more complete usage exemple for the mail classification task. Data cannot be provided for legislation and privacy matters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file classmail-0.1.tar.gz
.
File metadata
- Download URL: classmail-0.1.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a377fa900d7698f9b846a95c97bbfd866489490000b96187f4a853c244c5d594 |
|
MD5 | 6710ab21ee6917213d6da06757d2de33 |
|
BLAKE2b-256 | 884d2f6f1b6daa242f6e75e10964c6a9d30ddd4a8c850dfada4814d8e081e1cb |
File details
Details for the file classmail-0.1-py3-none-any.whl
.
File metadata
- Download URL: classmail-0.1-py3-none-any.whl
- Upload date:
- Size: 30.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c3ad5c3a57d0b6f8413bf04a6e80da02cee717f08312d89ea44ac16558833a1 |
|
MD5 | 5a8bad55d950db11f9d703ae1224d690 |
|
BLAKE2b-256 | 4c710aeb5d43441a6b670cf881c0b6c2a9a72a5a361c110276b8e1008045644a |