A python package to semantically link two lists of texts.
Project description
🖇️ NLP Link
NLP Link finds the most similar word (or words) in a reference list to an inputted word. For example, if you are trying to find which word is most similar to 'puppies' from a reference list of ['cats', 'dogs', 'rats', 'birds']
, nlp-link will return 'dogs'.
🗺️ SOC Mapper
Another functionality of this package is using the linking methodology to find the Standard Occupation Classification (SOC) code most similar to an inputted job title. More on this here.
🔨 Usage
Install the package using pip:
pip install nlp-link
Basic usage
Match two lists in python:
from nlp_link.linker import NLPLinker
nlp_link = NLPLinker()
# list inputs
comparison_data = ['cats', 'dogs', 'rats', 'birds']
input_data = ['owls', 'feline', 'doggies', 'dogs','chair']
nlp_link.load(comparison_data)
matches = nlp_link.link_dataset(input_data)
# Top match output
print(matches)
Which outputs:
input_id input_text link_id link_text similarity
0 0 owls 3 birds 0.613577
1 1 feline 0 cats 0.669633
2 2 doggies 1 dogs 0.757443
3 3 dogs 1 dogs 1.000000
4 4 chair 0 cats 0.331178
SOC Mapping
Match a list of job titles to SOC codes:
from nlp_link.soc_mapper.soc_map import SOCMapper
soc_mapper = SOCMapper()
soc_mapper.load()
job_titles=["data scientist", "Assistant nurse", "Senior financial consultant - London"]
soc_mapper.get_soc(job_titles, return_soc_name=True)
Which will output
[((('2433/04', 'Statistical data scientists'), ('2433', 'Actuaries, economists and statisticians'), '2425'), 'Data scientist'), ((('6131/99', 'Nursing auxiliaries and assistants n.e.c.'), ('6131', 'Nursing auxiliaries and assistants'), '6141'), 'Assistant nurse'), ((('2422/02', 'Financial advisers and planners'), ('2422', 'Finance and investment analysts and advisers'), '3534'), 'Financial consultant')]
Contributing
The instructions here are for those contrbuting to the repo.
Set-up
In setting up this project we ran:
conda create --name nlp-link pip python=3.9
conda activate nlp-link
pip install poetry
pip install pre-commit black
pre-commit install
poetry init
poetry install
Tests
To run tests:
poetry run pytest tests/
Documentation
Docs for this repo are automatically published to gh-pages branch via. Github actions after a PR is merged into main. We use Material for MkDocs for these. Nothing needs to be done to update these.
However, if you are editing the docs you can test them out locally by running
cd docs
<!-- pip install -r docs/requirements.txt -->
mkdocs serve
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nlp_link-0.1.2.tar.gz
.
File metadata
- Download URL: nlp_link-0.1.2.tar.gz
- Upload date:
- Size: 161.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 061f45dcba0376080a398cbcdc34afa06980e4b566661b7ca3e6ae57442df192 |
|
MD5 | 41d3f83143dacc215e596c49afde7c3d |
|
BLAKE2b-256 | 75e924d2b75ca43709ff29a5e6cf088955c3ff7661d1094472d298d9eb6b0879 |
File details
Details for the file nlp_link-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: nlp_link-0.1.2-py3-none-any.whl
- Upload date:
- Size: 161.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.9.19 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcc30b067cbbf39afa9df4c3ba4afe1c714f81d72819cc1c6046fc050b78c637 |
|
MD5 | 9e267916e6407927243451e73e89482d |
|
BLAKE2b-256 | 10def428810cc8126beb162cdabfe8a686508df7be56adfefa2e08214783c91e |