Skip to main content

A simple framework for automatise mail classification task

Project description

ClassMail

alt text Classmail

Mail classification Python library optimized for french mails in the field of insurance. Classmail was created to automate mail classification workflow in quick experiments. Developped during my internship at Covéa.

Classmail provides:

  • Data visualisation: For quick data analysis, based on matplotlib and seaborn

  • Mails preprocessing (cleaning): Optimised for inasurrance purposes, with prebuilt regular expressions (in french). This configuration file can be adapted for other languages or purposes.

  • Deep learning model creation (for classification): Simple interface to build Pytorch models quickly based on Flair nlp library.

  • Model analysis and explainer Simple interface with prebuilt seaborn graphs and model explainer based on Lime.

Quick Start

Requirements and Installation

The project is based on Python 3.7+. If you do not have Python 3.6, install it first. Then, in your favorite virtual environment, simply do:

pip install classmail

Example Usage

Let's run named entity recognition (NER) over an example sentence. All you need to do is make a Sentence, load a pre-trained model and use it to predict tags for the sentence:

  • Data analysis

    import classmail.data_visualisation.data_visualisation as dv
    
    # show class balancing graph
    dv.plot_class_balancing(data,col_text='header_body',col_label="COMPETENCE", title="Catégories des mails")
    #show most frequent bigrams
    dv.plot_word_frequencies(data['message'],ngram=2,words_nb=20)
    #plot a wordcloud with most frequent terms in body
    dv.plot_wordcloud(data['body'])
    
  • Cleaning

    from classmail.nlp.cleaning import clean_mail
    
    #create a new column in data ("clean_text") with preprocessed header and body
    data = clean_mail(data,"body","header")
    
  • Model creation and training

    from classmail.classification.trainer import Trainer
    
    trainer = Trainer()
    #generate train / test / val csv files
    trainer.prepare_data(data, col_text="clean_text",col_label="COMPETENCE", train_size=0.7, val_size=0.15, test_size=0.15)
    
    #create a new column in data ("clean_text") with preprocessed header and body
    data = clean_mail(data,"body","header")
    
    #train a model with default parameters
    trainer.train_model(model_name="default_model")
    
  • Model predictions, evaluation and explaination

    from classmail.classification.model import Model
    
    #load our model, saved in "ressources" folder
    model = Model("ressources\\model_default")
    #predictions
    predictions=model.get_predictions(X_test)
    #confusion matrix
    model.plot_confusion_matrix(pred_labels=predictions, true_labels=y_test)
    #explain one exemple at index 110
    model.visualize_one_ex(X_test,y_test,index=110,num_features=6)
    #compute most discriminants words in each category
    sorted_contributions = model.get_statistical_explanation(X_test, ["class 1","class 2","class 3"] sample_size=15)
    #plot them for first class
    model.plot_discriminant_words(sorted_contributions, "class 1", nb_words=15)
    

Tutorial

Here is a more complete usage exemple for the mail classification task. Data cannot be provided for legislation and privacy matters.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classmail-0.1.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

classmail-0.1-py3-none-any.whl (30.5 kB view details)

Uploaded Python 3

File details

Details for the file classmail-0.1.tar.gz.

File metadata

  • Download URL: classmail-0.1.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for classmail-0.1.tar.gz
Algorithm Hash digest
SHA256 a377fa900d7698f9b846a95c97bbfd866489490000b96187f4a853c244c5d594
MD5 6710ab21ee6917213d6da06757d2de33
BLAKE2b-256 884d2f6f1b6daa242f6e75e10964c6a9d30ddd4a8c850dfada4814d8e081e1cb

See more details on using hashes here.

File details

Details for the file classmail-0.1-py3-none-any.whl.

File metadata

  • Download URL: classmail-0.1-py3-none-any.whl
  • Upload date:
  • Size: 30.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for classmail-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1c3ad5c3a57d0b6f8413bf04a6e80da02cee717f08312d89ea44ac16558833a1
MD5 5a8bad55d950db11f9d703ae1224d690
BLAKE2b-256 4c710aeb5d43441a6b670cf881c0b6c2a9a72a5a361c110276b8e1008045644a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page