It a simple package for training and classification of resumes.
Project description
Resume Classification
Objective
Aim of this project is to train a set of resumes of specific domain and create a machine learning model to predict the unseen resumes.
Currently the model is trained using logistic regression on these four domain:
- Java
- Cloud
- Big Data
- Machine Learning
Resumes are read using a package called tika which supports many file formats including the following popular ones:
- doc
- docx
- pdf
NOTE: To know more about tika visit the following link: https://pypi.org/project/tika/
Installation
pip install resume_classification
Dependencies
- numpy==1.17.3
- pandas==0.25.1
- tika==1.24
- nltk==3.4.5
Python version
`Python 3.7.4`
Project Guidelines
-
Train
In order to train a new set of resumes, the project ought to have a defined folder structure given below:
Then run the following commands:
from resume_classification import train
train(path-to-resumes-folder)
Output
The output will consist of the following metrics
- Model Accuracy
- F1 Score
- Confusion Matrix
-
Predict
Use the following command:
from resume_classification import predict
predict(path-to-resumes)
NOTE for predict module, the path to resume will contain all the unseen resumes in a single folder.
Output
The output will be a dataframe consisting of
file name
andpredicted domain
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for resume_classification-1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fa22e33dad4648a4ef43fd6e03fca5c9a9cbe573cc0628799825e187369fa79 |
|
MD5 | 9f4dd96ae2dba3c2ab6d066f4ae784ba |
|
BLAKE2b-256 | c09d589bcca9eabbe0424d30e6297c799cf79b2da8fce2bddd79939f7634d905 |
Hashes for resume_classification-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dee7ac558839f016f68726cbc24ca29df426ae6611587b933092ac32c605973 |
|
MD5 | 17ae440821739ff820bebfb297a2680a |
|
BLAKE2b-256 | 13683341f411a8d26eb3a08b81232f4ac7e6f4e34d65cfb7f884fd28ce47033d |