Package for topic modeling using BERTopic, including templates for fitting models and making predictions.
Project description
Sinapsis BERTopic
Package for topic modeling using BERTopic.
🐍 Installation • 🚀 Features • 📙 Documentation • 🔍 License
Sinapsis BERTopic provides BERTopic model integration for the Sinapsis framework for topic clusterization.
🐍 Installation
Install using your package manager of choice. We encourage the use of uv
Example with uv:
uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
🚀 Features
Templates Supported
This package includes a publisher Template and a Worker agent
- BERTopicFitModel: A template class for fitting BERTopic models and saving them to disk.
- BERTopicFitModelFromDataFrame: A template class for fitting BERTopic models using data from a DataFrame and saving the model to disk.
- BERTopicPredict: Template for topic prediction using BERTopic models.
[!TIP] Use CLI command
sinapsis info --all-template-namesto show a list with all the available Template names installed with Sinapsis OpenAI.
[!TIP] Use CLI command
sinapsis info --example-template-config TEMPLATE_NAMEto produce an example Agent config for the Template specified in TEMPLATE_NAME.
For example, for BERTopicPredict use sinapsis info --example-template-config BERTopicPredict to produce an example config like:
agent:
name: my_test_agent
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: BERTopicPredict
class_name: BERTopicPredict
template_input: InputTemplate
attributes:
root_dir: /root/.cache/sinapsis
export_visualization_to_image: true
image_export_params:
format: png
width: null
height: null
scale: null
validate_figure: true
sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
model_path: '`replace_me:<class ''str''>`'
visualize_predictions: true
visualize_topics: true
historical_data_path: null
prediction_viz_path: prediction_viz.html
visualize_topics_params:
topics: null
top_n_topics: null
use_ctfidf: false
custom_labels: false
title: <b>Intertopic Distance Map</b>
figure_width: 650
figure_height: 650
save_topic_visualization_path: topics_visualization.html
📚 Usage example
Below is an example YAML configuration for BERTopic model fit.
Config
agent:
name: my_test_agent
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: BERTopicFitModel
class_name: BERTopicFitModel
template_input: InputTemplate
attributes:
root_dir: /root/.cache/sinapsis
export_visualization_to_image: true
image_export_params:
format: png
width: null
height: null
scale: null
validate_figure: true
sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
bertopic_model_params:
language: english
top_n_words: 10
n_gram_range:
- 1
- 1
min_topic_size: 10
nr_topics: null
low_memory: false
calculate_probabilities: false
seed_topic_list: null
zeroshot_topic_list: null
zeroshot_min_similarity: 0.7
bertopic_save_model_params:
serialization: safetensors
save_ctfidf: true
hdbscan_model_params:
min_cluster_size: 5
min_samples: null
cluster_selection_epsilon: 0.0
cluster_selection_persistence: 0.0
max_cluster_size: 0
metric: euclidean
alpha: 1.0
p: null
algorithm: best
leaf_size: 40
approx_min_span_tree: true
gen_min_span_tree: false
core_dist_n_jobs: 4
cluster_selection_method: eom
allow_single_cluster: false
prediction_data: false
branch_detection_data: false
match_reference_implementation: false
cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
kwargs: '`replace_me:dict[str, typing.Any]`'
save_documents_visualization_path: documents_visualization.html
save_model_path: '`replace_me:<class ''str''>`'
save_training_data: false
save_training_data_path: training_data.pkl
umap_model_params:
n_neighbors: 15
n_components: 2
metric: euclidean
metric_kwds: null
output_metric: euclidean
output_metric_kwds: null
n_epochs: null
learning_rate: 1.0
init: spectral
min_dist: 0.1
spread: 1.0
low_memory: true
n_jobs: -1
set_op_mix_ratio: 1.0
local_connectivity: 1.0
repulsion_strength: 1.0
negative_sample_rate: 5
transform_queue_size: 4.0
a: null
b: null
random_state: null
angular_rp_forest: false
target_n_neighbors: -1
target_metric: categorical
target_metric_kwds: null
target_weight: 0.5
transform_seed: 42
transform_mode: embedding
force_approximation_algorithm: false
verbose: false
tqdm_kwds: null
unique: false
densmap: false
dens_lambda: 2.0
dens_frac: 0.3
dens_var_shift: 0.1
output_dens: false
disconnection_distance: null
precomputed_knn:
- null
- null
- null
visualize_documents_params:
topics: null
sample: null
hide_annotations: false
hide_document_hover: false
custom_labels: false
title: <b>Documents and Topics</b>
width: 1200
height: 750
This configuration defines an agent and a sequence of templates to fit a bertopic data based on incoming data.
To run the config, use the CLI:
sinapsis run name_of_config.yml
📙 Documentation
Documentation for this and other sinapsis packages is available on the sinapsis website
Tutorials for different projects within sinapsis are available at sinapsis tutorials page
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinapsis_bertopic-0.1.2.tar.gz.
File metadata
- Download URL: sinapsis_bertopic-0.1.2.tar.gz
- Upload date:
- Size: 32.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
758a780f464ddcad943827da6ba29708a83a9bdcffcf204278803275a9f13790
|
|
| MD5 |
9d61f4df79c3e8d54403993636838d5d
|
|
| BLAKE2b-256 |
20917616fbffdafe3cee582efc1890bec7175012615f8c5c6572f7afac130e3a
|
File details
Details for the file sinapsis_bertopic-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sinapsis_bertopic-0.1.2-py3-none-any.whl
- Upload date:
- Size: 35.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
192b8fb383df823fc1d27b145cf50a9295bbd19e1f5cc5df4b959eebe7beb5ab
|
|
| MD5 |
2d20a23e5ab15b8fd794a1ba505203ee
|
|
| BLAKE2b-256 |
ad509b8c072a990c995c71d970be5ab3d0bb72955c9de84ae071e3bba909a5f9
|