Skip to main content

A small package to do Sentence Clustering with BERT (SCBert)

Project description

Sentence Clustering with BERT (SCB)

Sentence Clustering with BERT project which aim to use state-of-the-art BERT models to compute vectors for sentences. A few tools are also implemented to explore those vectors and how sentences are related to each others in the latent space.

Demonstration

  • Create vectors from raw data :
#How to transform raw french texts into vectors using BERT model. 
from SCBert.SCBert import Vectorizer

vectorizer = Vectorizer("flaubert")
text_vectors = vectorizer.vectorize(data)
  • Explore the embedded space :
#How to explore the relation in your data. 
from SCBert.SCBert import EmbeddingExplorer

ee = EmbeddingExplorer(data,text_vectors)
labels = ee.cluster(k=3)                     #Cluster with k-means 
ee.extract_keywords()                        #Extract keywords using Rake algorithm, then accessible with ee.keywords
ee.explore(color = labels)                   #Generate a plot with PCA of the embedded vectors with colors corresponding to the labels 

Built-in example

There is a built-in example that you can find in the example folder. It comes with it's own data which is the CLS-fr composed of Amazon reviews from different sources (DVD, CD, Livres)

Installation

You can either download the zip file or use the Pypi package that you can install with the following command :

> pip install SCBert

If you encounter problems during the installation it may be because of the multi-rake dependy with cld2-cffi. I will try to address this later on. To bypass, just follow the instructions :

> export CFLAGS="-Wno-narrowing"
> pip install cld2-cffi
> pip install multi-rake

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SCBert-0.2.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SCBert-0.2-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file SCBert-0.2.tar.gz.

File metadata

  • Download URL: SCBert-0.2.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.7

File hashes

Hashes for SCBert-0.2.tar.gz
Algorithm Hash digest
SHA256 894c6a4336c8333bee8059633d003307f31002ee6086604ad0dc448e06898241
MD5 20d494a610d7e67dda74da3e768c96cf
BLAKE2b-256 f623f15b3534646e90348a1dea8ff0fcca9e4276dc1ff745bedd4a8d4b51a567

See more details on using hashes here.

File details

Details for the file SCBert-0.2-py3-none-any.whl.

File metadata

  • Download URL: SCBert-0.2-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.7

File hashes

Hashes for SCBert-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3b12c4aacddf54db6e8f5f2e2b4111095ba8902c2942e3114c5bfcb540404a30
MD5 dc65b8fa468e39728ef9ca1d34c252cd
BLAKE2b-256 21f45f7e6241a319805c11ff3d961c1be2b8ad20b3e65170e4daa6f8be712847

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page