Skip to main content

Clustering using different non-parameteric models with the combination of word Embedding

Project description

Clustering using different non-parameteric models with the power of bert embeddings

Usage

# import the crp algorithm
from embed_clustering.latent_component import crp_algorithm

# read the data you want to cluster
import pandas as pd
df = pd.read_csv('sample.csv')

corpus = df[column] # mention the column you want to cluster

# apply the algorithm by passing the parameters
df['cluster'] = crp_algorithm(corpus, compute='cuda', cleaning=True) #if you have gpu, else computer='cpu', if you doesn't wish to clean the text before clustering you can flag cleaning=False

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed_clustering-0.0.3.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distribution

embed_clustering-0.0.3-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page