Skip to main content

Classification of Job Offer Responses

Project description

Sentiment Classifier

Classify job candidate emails

Sentiment classifier of emails from job candidates based on whether an email response expresses an interesting candidate for the job position.

Install

The sentiment classifier can be found on PyPI so you can just run:

pip install job-offer-classifier

For an editable install, clone the GitHub repository and cd to the cloned repo directory, then run:

pip install -e job_offer_classifier

How to use

Run the Pipeline

First load and run the data science pipeline by importing the module:

from job_offer_classifier.pipeline_classifier import Pipeline

Instantiate the class Pipeline and call the pipeline method. This method loads the dataset, and trains and evaluates the model. The source file is the dataset of payloads annotated with 'positive' and 'negative' labels

pl = Pipeline(src_file = '../data/interim/payloads.csv',random_state=931696214)
pl.pipeline()

The parameter random_state is the pandas seed used in the dataframe split. This parameter is necessary to present deterministic results and has been chosen from the results of the k fold validation.

Predict Job Offer Sentiments

To make a prediction, use the sentiment method

pl.sentiment(''' Thank you for offering me the position of Merchandiser with Thomas Ltd.
I am thankful to accept this job offer and look ahead to starting my career with your company
on June 27, 2000.''')
'positive'

One can take an example from the test set, contained in the dfs attribute. This attribute is a dictionary of pandas dataframes.

example = pl.dfs['test'].sample(random_state=1213702178).payload.iloc[0]
print(example.strip())
thank you for offering me the position of financial analyst at Lozano-Carlson.
i was delighted to meet
you and learn more about the company.
although i verbally agreed to accept the position, i have given it a lot of thought and decided to turn
down the post.
i believe it is in my, and your company’s, best interests.
ultimately, i elected to take on a
position at a firm where i believe my skills and experience are a better fit. i truly apologise for any
inconvenience i have caused.
i was impressed with Lozano-Carlson during the interview, and continue to be at this time.
wishing you
all the best in the future and hope to still see you in attendance at the snow terrace financial conference
in june.
pl.sentiment(example)
'negative'

Performance

We use two tools to assesss the performance of the model:

  • Confusion Matrix
  • K fold Validation

Confusion matrix

To plot the confusion matrix, the Pipeline has the method plot_confusion_matrix.

pl.plot_confusion_matrix('train')

png

pl.plot_confusion_matrix('test')

png

K fold validation

To assess the performance of the model via the k fold validation method, import the class KFoldPipe

from job_offer_classifier.validations import KFoldPipe

Run the k_fold_validation method

kfp = KFoldPipe(src_file='../data/interim/payloads.csv',n_splits=4)
kfp.k_fold_validation()

The averaged scores are stored in averages

kfp.averages['train']
{'accuracy': 0.9954212456941605,
 'accuracy_baseline': 0.7985348105430603,
 'auc': 0.9987489432096481,
 'auc_precision_recall': 0.9996496587991714,
 'average_loss': 0.02481173211708665,
 'label/mean': 0.7985348105430603,
 'loss': 0.03453406784683466,
 'precision': 0.9954595416784286,
 'prediction/mean': 0.7989358454942703,
 'recall': 0.9988532066345215,
 'global_step': 12500.0,
 'f1_score': 0.9971447710408015}
kfp.averages['test']
{'accuracy': 0.980555534362793,
 'accuracy_baseline': 0.800000011920929,
 'auc': 0.995563268661499,
 'auc_precision_recall': 0.9989252239465714,
 'average_loss': 0.060208675917238,
 'label/mean': 0.800000011920929,
 'loss': 0.060208675917238,
 'precision': 0.986666664481163,
 'prediction/mean': 0.8020820915699005,
 'recall': 0.9895833283662796,
 'global_step': 12500.0,
 'f1_score': 0.9880000766313914}

The seed of the best F1 score is stored in best_seed

kfp.best_seed
427851256

Multiclass classifier

The library supports multiple classes in labels. The following instruction uploads the multiclass classifier

from job_offer_classifier.multiclass import Multiclass

The sibatel_web_intekglobal_payloads.csv file contains three type of sentiments: 'positive', 'negative' and 'neutral'. Instantiate the Multiclass by specifying the number of classes

mc = Multiclass(
    src_file='../data/raw/sibatel_web_intekglobal_payloads.csv',
    random_state=931696214,
    n_classes=3
)
mc.pipeline()
mc.plot_confusion_matrix('train')

png

mc.plot_confusion_matrix('test')

png

Documentation

To further inquire on the training parameters and how to store and load the trained models, please refer to the pipeline docs and multiclass docs. The validation method can be found in the validations docs

References

https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_offer_classifier-0.0.8.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

job_offer_classifier-0.0.8-py3-none-any.whl (17.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page