Slimmer version of BERTopic for transforming new data with an existing, trained model.
Project description
Lightopic
This package addresses the specific use case of deploying a BERTopic model that you've trained, and now want to use for transforming new data, e.g. via an API.
This came up for me because I wanted to deploy such a model API but wanted to make the deployment smaller and faster. The BERTopic package is broad, which brings with it a load of dependencies (e.g. torch, a bunch of cuda libraries). So I wrote this as a way to do the transform
step only, with a virtual environment that's about 95% smaller than one with the actual BERTopic package.
The main prerequisite is that you need to have trained a BERTopic model separately and have serialised it in a way that's compatible with lightopic
. The lightopic
package also offers you a way to do that: guidance on how is below. From that point you can instantiate a Lightopic
object and use its transform
method on new data.
Training and serialising your LightBERTopic
model
This is a necessary step: you can't instantiate a Lightopic
object without first having trained and serialised your model. To make this part easier the LightBERTopic
class is available: this is a child class of bertopic.BERTopic
, only with a method added to save_lightopic
.
from lightopic.lightbertopic import LightBERTopic
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
topic_model = LightBERTopic()
topics, probs = topic_model.fit_transform(docs)
topic_model.save_lightopic("model_directory")
NB. for this to work you must have bertopic
installed, which you can do with pip install lightopic[bertopic]
.
NOTE: this package is still under development, so this required format may (and probably will) change!
Using a Lightopic
model
Now the serialised model is ready to use.
from lightopic import Lightopic
topic_model = Lightopic()
topic_model.load("model_directory")
topic_model.transform(embeddings)
This transform step does not rely on BERTopic at all, so it can use the smaller installation you get from pip install lightopic
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lightopic-0.0.5.tar.gz
.
File metadata
- Download URL: lightopic-0.0.5.tar.gz
- Upload date:
- Size: 187.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87dfda110930db09249f08fb09535f9a2313b34aa76d5734f5641fe076ef947a |
|
MD5 | 385359675665a437582dc6b9eaa3c623 |
|
BLAKE2b-256 | 391a5ad9d86d1d66b4b8581c0ec7dc3c82c1244e573135fa9efa8c223e8d3aa0 |
Provenance
The following attestation bundles were made for lightopic-0.0.5.tar.gz
:
Publisher:
publish-to-pypi.yml
on hamedbh/lightopic
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
lightopic-0.0.5.tar.gz
- Subject digest:
87dfda110930db09249f08fb09535f9a2313b34aa76d5734f5641fe076ef947a
- Sigstore transparency entry: 149131958
- Sigstore integration time:
- Predicate type:
File details
Details for the file lightopic-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: lightopic-0.0.5-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a092e2b605c77cd540e22368a41fede2dbb038d83d03183d594f11ddd3d64016 |
|
MD5 | 453527c04ccaf624dd0b829e060d6a1d |
|
BLAKE2b-256 | b81aa2d865f84d40f4e20240588c7029dc6b63d24a8ab50a20d68edf8b5737d1 |
Provenance
The following attestation bundles were made for lightopic-0.0.5-py3-none-any.whl
:
Publisher:
publish-to-pypi.yml
on hamedbh/lightopic
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
lightopic-0.0.5-py3-none-any.whl
- Subject digest:
a092e2b605c77cd540e22368a41fede2dbb038d83d03183d594f11ddd3d64016
- Sigstore transparency entry: 149131959
- Sigstore integration time:
- Predicate type: