A modified resume parser built on the pyresparse library used for extracting information from resumes
Project description
pyresparser
A modified pyresparse resume parser that extracts years of experience more accurately
Built with â¤ï¸ژ by Ahmed ElKodsh
Features
- Extract name
- Extract email
- Extract mobile numbers
- Extract skills
- Extract total experience (more accurately)
- Extract college name
- Extract degree
- Extract designation
- Extract company names
Installation
- You can install this package using
pip install indegreeparser
- For NLP operations we use spacy and nltk. Install them using below commands:
# spaCy
python -m spacy download en_core_web_sm
# nltk
python -m nltk.downloader words
python -m nltk.downloader stopwords
Documentation
Official documentation is available at: https://www.omkarpathak.in/pyresparser/
Supported File Formats
- PDF and DOCx files are supported on all Operating Systems
- If you want to extract DOC files you can install textract for your OS (Linux, MacOS)
- Note: You just have to install textract (and nothing else) and doc files will get parsed easily
Usage
- Import it in your Python project
from indegreeparser import ResumeParser
data = ResumeParser('/path/to/resume/file').get_extracted_data()
CLI
For running the resume extractor you can also use the cli
provided
usage: pyresparser [-h] [-f FILE] [-d DIRECTORY] [-r REMOTEFILE]
[-re CUSTOM_REGEX] [-sf SKILLSFILE] [-e EXPORT_FORMAT]
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE resume file to be extracted
-d DIRECTORY, --directory DIRECTORY
directory containing all the resumes to be extracted
-r REMOTEFILE, --remotefile REMOTEFILE
remote path for resume file to be extracted
-re CUSTOM_REGEX, --custom-regex CUSTOM_REGEX
custom regex for parsing mobile numbers
-sf SKILLSFILE, --skillsfile SKILLSFILE
custom skills CSV file against which skills are
searched for
-e EXPORT_FORMAT, --export-format EXPORT_FORMAT
the information export format (json)
Notes:
- If you are running the app on windows, then you can only extract .docs and .pdf files
Result
The module would return a list of dictionary objects with result as follows:
[
{
'college_name': ['Marathwada Mitra Mandal’s College of Engineering'],
'company_names': None,
'degree': ['B.E. IN COMPUTER ENGINEERING'],
'designation': ['Manager',
'TECHNICAL CONTENT WRITER',
'DATA ENGINEER'],
'email': 'omkarpathak27@gmail.com',
'mobile_number': '8087996634',
'name': 'Omkar Pathak',
'no_of_pages': 3,
'skills': ['Operating systems',
'Linux',
'Github',
'Testing',
'Content',
'Automation',
'Python',
'Css',
'Website',
'Django',
'Opencv',
'Programming',
'C',
...],
'total_experience': 1.83
}
]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
indegreeparser-1.0.1.tar.gz
(30.1 kB
view hashes)
Built Distribution
Close
Hashes for indegreeparser-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 475c6bc3a078474bca3f337bf899fe8c1d3694640299c5fa6bfc1b1562a8ad76 |
|
MD5 | e554f69ed694ed5a044a15427458a667 |
|
BLAKE2b-256 | cdcd471c5cd0b9e0a593a00cd11532bebbfb8559d8354959efe3a0bc12875d97 |