Fast extraction of job titles from strings
Project description
find_job_titles
Find Job Titles in Strings
- Free software: MIT license
- Python versions: 2.7, 3.4+
Features
- Find any of 77k job titles in a given string
- Text processing is extremely fast using “acora” library
- Dictionary generation takes about 20 seconds upfront
Quickstart
Instantiate “Finder” and start extracting job titles:
>>> from find_job_titles import Finder >>> finder.findall('I am the Senior Vice President') [('Senior Vice President', 9), ('Vice President', 16), ('President', 21)]
All possible, overlapping matches are returned. Matches contain positional information of where the match was found.
Alternatively use “finditer” for lazy consumption of matches:
>>> finder.finditer('I am the Senior Vice President')] <generator object ...>
Credits
This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.
History
0.7.0 (2017-08-22)
- fixed tox tests for py27 re: different unicode treatment by acora and pyahocorasick
- only testing default Finder using pyahocorasick now.
0.6.0 (2017-08-22)
- rewrote and fixed longest match code
- added pyahocorasick implementation and made default
- added params to enable/disable longest matches
0.5.0 (2017-08-22)
0.4.0 (2017-08-21)
- updated title list with marketing execs
- set non-dev version
0.3.0-dev (2017-08-18)
- updated title list (- surnames, - blacklist, + added_roles)
0.2.0-dev (2017-08-18)
- proper tracking of code with releases
0.1.0 (unreleased)
- First release on PyPI.
Project details
Release history Release notifications
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size find_job_titles-0.7.0-py2.py3-none-any.whl (383.1 kB) | File type Wheel | Python version py2.py3 | Upload date | Hashes View hashes |
Filename, size find_job_titles-0.7.0.tar.gz (396.4 kB) | File type Source | Python version None | Upload date | Hashes View hashes |
Close
Hashes for find_job_titles-0.7.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ad27d617834cc0c1630d3e5cf09b0df62c5913f69ddc5a682797d7a331e7c40 |
|
MD5 | 2bb9fea9a1415f0f616fc0096e5f3156 |
|
BLAKE2-256 | e3439f8294dabf906f3cc5277a0914a4dcc7fb6d506c3e8c317e469c11dbeea7 |