Skip to main content

Slimmer version of BERTopic for transforming new data with an existing, trained model.

Project description

Lightopic

This package addresses the specific use case of deploying a BERTopic model that you've trained, and now want to use for transforming new data, e.g. via an API.

This came up for me because I wanted to deploy such a model API but wanted to make the deployment smaller and faster. The BERTopic package is broad, which brings with it a load of dependencies (e.g. torch, a bunch of cuda libraries). So I wrote this as a way to do the transform step only, with a virtual environment that's about 95% smaller than one with the actual BERTopic package.

The main prerequisite is that you need to have trained a BERTopic model separately and have serialised it in a way that's compatible with lightopic. The lightopic package also offers you a way to do that: guidance on how is below. From that point you can instantiate a Lightopic object and use its transform method on new data.

Training and serialising your LightBERTopic model

This is a necessary step: you can't instantiate a Lightopic object without first having trained and serialised your model. To make this part easier the LightBERTopic class is available: this is a child class of bertopic.BERTopic, only with a method added to save_lightopic.

from lightopic.lightbertopic import LightBERTopic
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

topic_model = LightBERTopic()
topics, probs = topic_model.fit_transform(docs)
topic_model.save_lightopic("model_directory")

NB. for this to work you must have bertopic installed, which you can do with pip install lightopic[bertopic].

NOTE: this package is still under development, so this required format may (and probably will) change!

Using a Lightopic model

Now the serialised model is ready to use.

from lightopic import Lightopic
topic_model = Lightopic()
topic_model.load("model_directory")
topic_model.transform(embeddings)

This transform step does not rely on BERTopic at all, so it can use the smaller installation you get from pip install lightopic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightopic-0.0.6.tar.gz (187.5 kB view details)

Uploaded Source

Built Distribution

lightopic-0.0.6-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file lightopic-0.0.6.tar.gz.

File metadata

  • Download URL: lightopic-0.0.6.tar.gz
  • Upload date:
  • Size: 187.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lightopic-0.0.6.tar.gz
Algorithm Hash digest
SHA256 1ceb242b1267a3565df9f7e9c242c2bfab7037462634a59ff04baa8d0954c6e5
MD5 fed110429b3d8900f892477f90e12d3e
BLAKE2b-256 34be56a29b8c4adf82e762248ea6c8665e2be6fbb7dba54ac61221f0a2a7cc86

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightopic-0.0.6.tar.gz:

Publisher: publish-to-pypi.yml on hamedbh/lightopic

Attestations:

File details

Details for the file lightopic-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: lightopic-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for lightopic-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6b3de7d5f4f3abc07e8d70bd3e5eb5001cfb26896b5a3e94d67f7202cd06813f
MD5 4305ef171214fd16cbb621899e91222e
BLAKE2b-256 31f415bd9af8c3e08a96ea32e8b2674021de7f8794aa33c6d4e448b1a05f863a

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightopic-0.0.6-py3-none-any.whl:

Publisher: publish-to-pypi.yml on hamedbh/lightopic

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page