Extract higher level clusters from keywords

These details have not been verified by PyPI

Project links

Homepage

Project description

Simple Keyword Clusterer

A simple machine learning package to cluster keywords in higher-level groups.

Example:
"Senior Frontend Engineer" --> "Frontend Engineer"
"Junior Backend developer" --> "Backend developer"

Installation

pip install simple_keyword_clusterer

Usage

# import the package
from simple_keyword_clusterer import Clusterer

# read your keywords in list
with open("../my_keywords.txt", "r") as f:
    data = f.read().splitlines()

# instantiate object
clusterer = Clusterer()

# apply clustering
df = clusterer.extract(data)

print(df)

Performance

The algorithm will find the optimal number of clusters automatically based on the best Silhouette Score.

You can specify the number of clusters yourself too

# instantiate object
clusterer = Clusterer(n_clusters=4)

# apply clustering
df = clusterer.extract(data)

For best performance, try to reduce the variance of data by providing the same semantic context
(the job title keywords file should remain coherent, in that it shouldn't contain other stuff like gardening keywords).

If items are clearly separable, the algorithm should still be able to provide a useable output.

Customization

You can customize the clustering mechanism through the files

blacklist.txt
to_normalize.txt

If you notice that the clustering identifies unwanted groups, you can blacklist certain words simply by appending them in the blacklist.txt file.

The to_normalize.txt file contains tuples that identify a transformation to apply to the keyword. For instance

("back end", "backend), ("front end", "frontend), ("sr", "Senior"), ("jr", "junior")

Simply add your tuples to use this functionality.

Dependencies

Scikit-learn
Pandas
Matplotlib
Seaborn
Numpy
NLTK
Tqdm

Make sure to download NLTK English Stopwords with the command

nltk.download("stopwords")

Contact

If you feel like contacting me, do so and send me a mail. You can find my contact information on my website.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.3

Aug 26, 2021

1.2

Aug 26, 2021

1.1

Aug 26, 2021

1.0

Aug 26, 2021

This version

0.25

Aug 26, 2021

0.24

Aug 26, 2021

0.23

Aug 26, 2021

0.21

Aug 26, 2021

0.19

Aug 26, 2021

0.18

Aug 26, 2021

0.17

Aug 26, 2021

0.16

Aug 26, 2021

0.15

Aug 26, 2021

0.14

Aug 26, 2021

0.13

Aug 26, 2021

0.12

Aug 26, 2021

0.11

Aug 26, 2021

0.9

Aug 26, 2021

0.8

Aug 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_keyword_clusterer-0.25-py3-none-any.whl (6.7 kB view details)

Uploaded Aug 26, 2021 Python 3

File details

Details for the file simple_keyword_clusterer-0.25-py3-none-any.whl.

File metadata

Download URL: simple_keyword_clusterer-0.25-py3-none-any.whl
Upload date: Aug 26, 2021
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for simple_keyword_clusterer-0.25-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9ca302c5b3760807819c68ad4ada04edd48bcae297167f59326395096726a888`
MD5	`64593cfe913819c864a2d012ead27d4d`
BLAKE2b-256	`326bd79bee0b784a76e2006932b3311d2bdb5e74aa173ea9c97503a9707de4f6`

See more details on using hashes here.

simple-keyword-clusterer 0.25

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Simple Keyword Clusterer

Installation

Usage

Performance

Customization

Dependencies

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes