Fast extraction of job titles from strings
Project description
find_job_titles
Find Job Titles in Strings
- Free software: MIT license
- Python versions: 2.7, 3.4+
Features
- Find any of 77k job titles in a given string
- Text processing is extremely fast using “acora” library
- Dictionary generation takes about 20 seconds upfront
Quickstart
Instantiate “Finder” and start extracting job titles:
>>> from find_job_titles import Finder >>> finder.findall('I am the Senior Vice President') [('Senior Vice President', 9), ('Vice President', 16), ('President', 21)]
All possible, overlapping matches are returned. Matches contain positional information of where the match was found.
Alternatively use “finditer” for lazy consumption of matches:
>>> finder.finditer('I am the Senior Vice President')] <generator object ...>
Credits
This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.
History
0.7.0 (2017-08-22)
- fixed tox tests for py27 re: different unicode treatment by acora and pyahocorasick
- only testing default Finder using pyahocorasick now.
0.6.0 (2017-08-22)
- rewrote and fixed longest match code
- added pyahocorasick implementation and made default
- added params to enable/disable longest matches
0.5.0 (2017-08-22)
0.4.0 (2017-08-21)
- updated title list with marketing execs
- set non-dev version
0.3.0-dev (2017-08-18)
- updated title list (- surnames, - blacklist, + added_roles)
0.2.0-dev (2017-08-18)
- proper tracking of code with releases
0.1.0 (unreleased)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
find_job_titles-0.7.0.tar.gz
(396.4 kB
view hashes)
Built Distribution
Close
Hashes for find_job_titles-0.7.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ad27d617834cc0c1630d3e5cf09b0df62c5913f69ddc5a682797d7a331e7c40 |
|
MD5 | 2bb9fea9a1415f0f616fc0096e5f3156 |
|
BLAKE2-256 | e3439f8294dabf906f3cc5277a0914a4dcc7fb6d506c3e8c317e469c11dbeea7 |