Clustering using different non-parameteric models with the combination of word Embedding
Project description
Clustering using different non-parameteric models with the power of bert embeddings
Usage
# import the crp algorithm
from embed_clustering.latent_component import crp_algorithm
# read the data you want to cluster
import pandas as pd
df = pd.read_csv('sample.csv')
corpus = df[column] # mention the column you want to cluster
# apply the algorithm by passing the parameters
df['cluster'] = crp_algorithm(corpus, compute='cuda', cleaning=True) #if you have gpu, else computer='cpu', if you doesn't wish to clean the text before clustering you can flag cleaning=False
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
embed_clustering-0.0.3.tar.gz
(10.8 kB
view hashes)
Built Distribution
Close
Hashes for embed_clustering-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0332c7e192e5af17131459b49376e971be8368b2fb20a5f0528f0bfad174e84f |
|
MD5 | 8ac70f78207447f82dcf23c163f221b0 |
|
BLAKE2b-256 | 6cb1cb81a98e05def0a34a7c89e5338806489342e52c1f2be1bc069f7d998ccd |