Fast extraction of job titles from strings
Project description
find_job_titles
Find Job Titles in Strings
Free software: MIT license
Python versions: 2.7, 3.4+
Features
Find any of 77k job titles in a given string
Text processing is extremely fast using “acora” library
Dictionary generation takes about 20 seconds upfront
Quickstart
Instantiate “Finder” and start extracting job titles:
>>> from find_job_titles import Finder >>> finder.findall('I am the Senior Vice President') [('Senior Vice President', 9), ('Vice President', 16), ('President', 21)]
All possible, overlapping matches are returned. Matches contain positional information of where the match was found.
Alternatively use “finditer” for lazy consumption of matches:
>>> finder.finditer('I am the Senior Vice President')] <generator object ...>
Credits
This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.
History
0.7.0 (2017-08-22)
fixed tox tests for py27 re: different unicode treatment by acora and pyahocorasick
only testing default Finder using pyahocorasick now.
0.6.0 (2017-08-22)
rewrote and fixed longest match code
added pyahocorasick implementation and made default
added params to enable/disable longest matches
0.5.0 (2017-08-22)
0.4.0 (2017-08-21)
updated title list with marketing execs
set non-dev version
0.3.0-dev (2017-08-18)
updated title list (- surnames, - blacklist, + added_roles)
0.2.0-dev (2017-08-18)
proper tracking of code with releases
0.1.0 (unreleased)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file find_job_titles-0.7.0.tar.gz
.
File metadata
- Download URL: find_job_titles-0.7.0.tar.gz
- Upload date:
- Size: 396.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88763ef7e1f47ced03bda7e61c4cf778ef4f39cd71d4b59b226d7b49bf7e7aad |
|
MD5 | 84cfb2f037de12a858a00cb6004fd717 |
|
BLAKE2b-256 | dd79961b1af12d2d57cdc2d2d4bb0206dcdb1fbce9032e18d7a0b530afa72efb |
File details
Details for the file find_job_titles-0.7.0-py2.py3-none-any.whl
.
File metadata
- Download URL: find_job_titles-0.7.0-py2.py3-none-any.whl
- Upload date:
- Size: 383.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ad27d617834cc0c1630d3e5cf09b0df62c5913f69ddc5a682797d7a331e7c40 |
|
MD5 | 2bb9fea9a1415f0f616fc0096e5f3156 |
|
BLAKE2b-256 | e3439f8294dabf906f3cc5277a0914a4dcc7fb6d506c3e8c317e469c11dbeea7 |