Skip to main content

Low code text clustering for the Tibetan language

Project description

Text Clustering

This repository contains tools to easily embed and cluster texts as well as label clusters and produce visualizations of those labeled clusters.

Install

Install the library to get started:

pip install --upgrade bocluster

Usage

The pipeline can be used following the code block below.

from datasets import load_dataset
from bocluster.cluster import BoClusterClassifier

# load a Tibetan language text dataset
ds = load_dataset('billingsmoore/LotsawaHouse-bo-en', split='train')

# initilialize a BoClusterClassifier object
bcc = BoClusterClassifier()

# fit the classifier on a set of texts
bcc.fit(ds['bo'][:1000])

# if you want to treat all data points as members of clusters, with no data treated as outliers
bcc.classify_outliers()

# show a visualization of results
bcc.show()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bocluster-0.1.0.tar.gz (355.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bocluster-0.1.0-py3-none-any.whl (355.6 kB view details)

Uploaded Python 3

File details

Details for the file bocluster-0.1.0.tar.gz.

File metadata

  • Download URL: bocluster-0.1.0.tar.gz
  • Upload date:
  • Size: 355.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bocluster-0.1.0.tar.gz
Algorithm Hash digest
SHA256 37bb3b4e8a271b09100b086fcd8ddc5c00696a16c66c445c650ad6b90e2fbfd7
MD5 1708e980a10f83da65505bdab966139c
BLAKE2b-256 f153a9dad21c6794a6c96414e9cfc2a9a28bc340a20612e3c45f7cf7777df15b

See more details on using hashes here.

File details

Details for the file bocluster-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bocluster-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 355.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bocluster-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b71a3faca93d2534156158b4b07ea269e5932aac52dda4d08e8f2bcfb5ef4a90
MD5 5ac4eacdb094c70268d99e57331e7833
BLAKE2b-256 d6531d1be857043e0f1f46c62067cdb97bc8fec0cf97e4b939cdc425f605ee6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page