Classification of Job Offer Responses

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

Sentiment Classifier

Classification of email job offer response emails

This project classifies job offer response emails as 'positive' or 'negative' according to whether an email response expresses an interest in a job offer. The dataset contains job offer response emails annotated with 'positive' and 'negative' labels. The positive labels represent an interest in a job offer.

Install

The sentiment classifier can be found on PyPI so you can just run:

pip install job-offer-classifier

For an editable install, clone the GitHub repository and cd to the cloned repo directory, then run:

pip install -e job_offer_classifier

How to use

Run the Pipeline

First load and run the data science pipeline by importing the module:

from job_offer_classifier.pipeline_classifier import Pipeline

Instantiate the class Pipeline and call the pipeline method. This method loads the dataset, and trains and evaluates the model. The source file is the annotated dataset of payloads.

pl = Pipeline(src_file = '../data/interim/payloads.csv',random_state=931696214)
pl.pipeline()

The parameter random_state is the pandas seed used in the dataframe split. This parameter is necessary to present deterministic results and has been chosen from the results of the k fold validation.

Predict Job Offer Sentiments

To make a prediction, use the sentiment method

pl.sentiment(''' Thank you for offering me the position of Merchandiser with Thomas Ltd.
I am thankful to accept this job offer and look ahead to starting my career with your company
on June 27, 2000.''')

'positive'

One can take an example from the test set, contained in the dfs attribute. This attribute is a dictionary of pandas dataframes.

example = pl.dfs['test'].sample(random_state=1213702178).payload.iloc[0]
print(example.strip())

thank you for offering me the position of financial analyst at Lozano-Carlson.
i was delighted to meet
you and learn more about the company.
although i verbally agreed to accept the position, i have given it a lot of thought and decided to turn
down the post.
i believe it is in my, and your company’s, best interests.
ultimately, i elected to take on a
position at a firm where i believe my skills and experience are a better fit. i truly apologise for any
inconvenience i have caused.
i was impressed with Lozano-Carlson during the interview, and continue to be at this time.
wishing you
all the best in the future and hope to still see you in attendance at the snow terrace financial conference
in june.

pl.sentiment(example)

'negative'

Performance

We use two tools to assesss the performance of the model:

Confusion Matrix
K fold Validation

Confusion matrix

To plot the confusion matrix, the Pipeline has the method plot_confusion_matrix.

pl.plot_confusion_matrix('train')

png

pl.plot_confusion_matrix('test')

png

The percentage of the cases that are negative and predicted positive (False Negative rate) tend to be greater than the percent of the cases that are positive and predicted negative (True Negative rate). This is consistent with that fact that the dataset has more positive than negative cases and the model tends to see more positives.

K fold validation

To assess the performance of the model via the k fold validation method, import the class KFoldPipe

from job_offer_classifier.validations import KFoldPipe

Run the k_fold_validation method

kfp = KFoldPipe(src_file='../data/interim/payloads.csv',n_splits=4)
kfp.k_fold_validation()

The averaged scores are stored in averages

kfp.averages['train']

{'accuracy': 0.9880952388048172,
 'accuracy_baseline': 0.7985348105430603,
 'auc': 0.9955066740512848,
 'auc_precision_recall': 0.9986858516931534,
 'average_loss': 0.05668126232922077,
 'label/mean': 0.7985348105430603,
 'loss': 0.08459942694753408,
 'precision': 0.9875305742025375,
 'prediction/mean': 0.7992496639490128,
 'recall': 0.997706413269043,
 'global_step': 5000.0,
 'f1_score': 0.9925863572491515}

kfp.averages['test']

{'accuracy': 0.9555555433034897,
 'accuracy_baseline': 0.800000011920929,
 'auc': 0.9736689478158951,
 'auc_precision_recall': 0.9902697503566742,
 'average_loss': 0.14979842118918896,
 'label/mean': 0.800000011920929,
 'loss': 0.14979842118918896,
 'precision': 0.9690233767032623,
 'prediction/mean': 0.7958925664424896,
 'recall': 0.9756944328546524,
 'global_step': 5000.0,
 'f1_score': 0.9722424484561404}

The seed of the best F1 score is stored in best_seed

kfp.best_seed

2425132390

Documentation

To further inquire on the training parameters, how to store and load trained models, please refer to the pipeline docs. The validation method can be found in the validations docs

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.0.8

May 19, 2020

0.0.7

May 15, 2020

0.0.6

May 14, 2020

0.0.5

Apr 28, 2020

0.0.4

Apr 28, 2020

This version

0.0.3

Apr 27, 2020

0.0.2

Apr 24, 2020

0.0.1

Apr 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_offer_classifier-0.0.3.tar.gz (14.6 kB view hashes)

Uploaded Apr 27, 2020 Source

Built Distribution

job_offer_classifier-0.0.3-py3-none-any.whl (14.9 kB view hashes)

Uploaded Apr 27, 2020 Python 3

Hashes for job_offer_classifier-0.0.3.tar.gz

Hashes for job_offer_classifier-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`3b6f34ce540e806f0f17cb2ce0b14904b9d4956e3d8347453ac017a74a6ad7e6`
MD5	`5e39d185bf3beddaffcd5061dec7b725`
BLAKE2b-256	`b02e9f97dc7a4b293c5c1fd8f9585de565f37c9ee03330a2e469aa5732b8a5a4`

Hashes for job_offer_classifier-0.0.3-py3-none-any.whl

Hashes for job_offer_classifier-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ad9a209ad3d85bfcb0581aae4679d2cffaa84ca3b0544e5a2d2ba4b167114578`
MD5	`7ac610e6fa820f0e70402039e8f9eca5`
BLAKE2b-256	`86ec233ba460aabdef895ed4d75e5a44535136f7f7312b1bfe81d6990de6d305`