Skip to main content

Classification of Job Offer Responses

Project description

Sentiment Classifier

Classify job candidate emails

Sentiment classifier of emails from job candidates based on whether an email response expresses an interesting candidate for the job position.

Install

The sentiment classifier can be found on PyPI so you can just run:

pip install job-offer-classifier

For an editable install, clone the GitHub repository and cd to the cloned repo directory, then run:

pip install -e job_offer_classifier

How to use

Run the Pipeline

First load and run the data science pipeline by importing the module:

from job_offer_classifier.pipeline_classifier import Pipeline

Instantiate the class Pipeline and call the pipeline method. This method loads the dataset, and trains and evaluates the model. The source file is the dataset of payloads annotated with 'positive' and 'negative' labels

pl = Pipeline(src_file = '../data/interim/payloads.csv',random_state=931696214)
pl.pipeline()

The parameter random_state is the pandas seed used in the dataframe split. This parameter is necessary to present deterministic results and has been chosen from the results of the k fold validation.

Predict Job Offer Sentiments

To make a prediction, use the sentiment method

pl.sentiment(''' Thank you for offering me the position of Merchandiser with Thomas Ltd.
I am thankful to accept this job offer and look ahead to starting my career with your company
on June 27, 2000.''')
'positive'

One can take an example from the test set, contained in the dfs attribute. This attribute is a dictionary of pandas dataframes.

example = pl.dfs['test'].sample(random_state=1213702178).payload.iloc[0]
print(example.strip())
thank you for offering me the position of financial analyst at Lozano-Carlson.
i was delighted to meet
you and learn more about the company.
although i verbally agreed to accept the position, i have given it a lot of thought and decided to turn
down the post.
i believe it is in my, and your company’s, best interests.
ultimately, i elected to take on a
position at a firm where i believe my skills and experience are a better fit. i truly apologise for any
inconvenience i have caused.
i was impressed with Lozano-Carlson during the interview, and continue to be at this time.
wishing you
all the best in the future and hope to still see you in attendance at the snow terrace financial conference
in june.
pl.sentiment(example)
'negative'

Performance

We use two tools to assesss the performance of the model:

  • Confusion Matrix
  • K fold Validation

Confusion matrix

To plot the confusion matrix, the Pipeline has the method plot_confusion_matrix.

pl.plot_confusion_matrix('train')

png

pl.plot_confusion_matrix('test')

png

K fold validation

To assess the performance of the model via the k fold validation method, import the class KFoldPipe

from job_offer_classifier.validations import KFoldPipe

Run the k_fold_validation method

kfp = KFoldPipe(src_file='../data/interim/payloads.csv',n_splits=4)
kfp.k_fold_validation()

The averaged scores are stored in averages

kfp.averages['train']
{'accuracy': 0.9954212456941605,
 'accuracy_baseline': 0.7985348105430603,
 'auc': 0.9987489432096481,
 'auc_precision_recall': 0.9996496587991714,
 'average_loss': 0.02481173211708665,
 'label/mean': 0.7985348105430603,
 'loss': 0.03453406784683466,
 'precision': 0.9954595416784286,
 'prediction/mean': 0.7989358454942703,
 'recall': 0.9988532066345215,
 'global_step': 12500.0,
 'f1_score': 0.9971447710408015}
kfp.averages['test']
{'accuracy': 0.980555534362793,
 'accuracy_baseline': 0.800000011920929,
 'auc': 0.995563268661499,
 'auc_precision_recall': 0.9989252239465714,
 'average_loss': 0.060208675917238,
 'label/mean': 0.800000011920929,
 'loss': 0.060208675917238,
 'precision': 0.986666664481163,
 'prediction/mean': 0.8020820915699005,
 'recall': 0.9895833283662796,
 'global_step': 12500.0,
 'f1_score': 0.9880000766313914}

The seed of the best F1 score is stored in best_seed

kfp.best_seed
427851256

Multiclass classifier

The library supports multiple classes in labels. The following instruction uploads the multiclass classifier

from job_offer_classifier.multiclass import Multiclass

The sibatel_web_intekglobal_payloads.csv file contains three type of sentiments: 'positive', 'negative' and 'neutral'. Instantiate the Multiclass by specifying the number of classes

mc = Multiclass(
    src_file='../data/raw/sibatel_web_intekglobal_payloads.csv',
    random_state=931696214,
    n_classes=3
)
mc.pipeline()
mc.plot_confusion_matrix('train')

png

mc.plot_confusion_matrix('test')

png

Documentation

To further inquire on the training parameters and how to store and load the trained models, please refer to the pipeline docs and multiclass docs. The validation method can be found in the validations docs

References

https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_offer_classifier-0.0.8.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

job_offer_classifier-0.0.8-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file job_offer_classifier-0.0.8.tar.gz.

File metadata

  • Download URL: job_offer_classifier-0.0.8.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200311 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for job_offer_classifier-0.0.8.tar.gz
Algorithm Hash digest
SHA256 99201935052497258af9b986bbcab2ef0460829c727bb640bed65581bbbef8f5
MD5 687d3bbc2b523da009ca77f196b377dc
BLAKE2b-256 a344a6eeb4de7cce63fa4621d38968059ad0d91230de00852d800cad1dcd406c

See more details on using hashes here.

File details

Details for the file job_offer_classifier-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: job_offer_classifier-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200311 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for job_offer_classifier-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2dd1478d1a0008f61312bbefe89ff1451a1292a4eccbe42e995c4642f1e44257
MD5 4f3c4980b3535fb9e24d451b874282e7
BLAKE2b-256 7ee99497b1df41ff0815967fa50d7027584353ca91e7d0b21099d420715fc9f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page