Add your description here
Project description
Sinapsis BERTopic
Package for BERTopic
🐍 Installation • 🚀 Features • 📙 Documentation • 🔍 License
Sinapsis BERTopic provides BERTopic model integration for the Sinapsis framework for topic clusterization.
🐍 Installation
Install using your package manager of choice. We encourage the use of uv
This project is private. Make sure you have authorized credentials before proceeding.
Recommended Method (using .netrc):
To avoid baking credentials into URLs, configure your ~/.netrc file with your credentials:
Example with uv:
uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
or with raw pip:
pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech
🚀 Features
Templates Supported
This package includes a publisher Template and a Worker agent
-
BERTopicFitModel: A template class for fitting BERTopic models and saving them to disk.
-
BERTopicPredict: Template for topic prediction using BERTopic models.
-
BERTopicVisualizeDocuments: BERTopic-based document visualization template for generating and exporting interactive topic model visualizations. This template extends BERTopicBase to provide functionality for encoding documents using sentence transformers, fitting a BERTopic model, and producing interactive visualizations of documents in a reduced dimensional space. The visualizations can be saved as HTML files and optionally exported as image arrays.
-
BERTopicVisualizeTopics: Template for BERTopic topic visualization.
This template extends BERTopicPredict to generate and save interactive visualizations of topics discovered by a BERTopic model. It produces plotly-based visual representations of topic relationships and characteristics, and persists them as HTML files.
[!TIP] Use CLI command
sinapsis info --all-template-namesto show a list with all the available Template names installed with Sinapsis OpenAI.
[!TIP] Use CLI command
sinapsis info --example-template-config TEMPLATE_NAMEto produce an example Agent config for the Template specified in TEMPLATE_NAME.
For example, for BERTopicFitModel use sinapsis info --example-template-config BERTopicFitModel to produce an example config like:
agent:
name: my_test_agent
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: BERTopicFitModel
class_name: BERTopicFitModel
template_input: InputTemplate
attributes:
bertopic_model_params:
language: english
top_n_words: 10
n_gram_range: !!python/tuple
- 1
- 1
min_topic_size: 10
nr_topics: null
low_memory: false
calculate_probabilities: false
seed_topic_list: null
zeroshot_topic_list: null
zeroshot_min_similarity: 0.7
umap_model_params:
n_neighbors: 15
n_components: 2
metric: euclidean
metric_kwds: null
output_metric: euclidean
output_metric_kwds: null
n_epochs: null
learning_rate: 1.0
init: spectral
min_dist: 0.1
spread: 1.0
low_memory: true
n_jobs: -1
set_op_mix_ratio: 1.0
local_connectivity: 1.0
repulsion_strength: 1.0
negative_sample_rate: 5
transform_queue_size: 4.0
a: null
b: null
random_state: null
angular_rp_forest: false
target_n_neighbors: -1
target_metric: categorical
target_metric_kwds: null
target_weight: 0.5
transform_seed: 42
transform_mode: embedding
force_approximation_algorithm: false
verbose: false
tqdm_kwds: null
unique: false
densmap: false
dens_lambda: 2.0
dens_frac: 0.3
dens_var_shift: 0.1
output_dens: false
disconnection_distance: null
precomputed_knn: !!python/tuple
- null
- null
- null
hdbscan_model_params:
min_cluster_size: 5
min_samples: null
cluster_selection_epsilon: 0.0
cluster_selection_persistence: 0.0
max_cluster_size: 0
metric: euclidean
alpha: 1.0
p: null
algorithm: best
leaf_size: 40
approx_min_span_tree: true
gen_min_span_tree: false
core_dist_n_jobs: 4
cluster_selection_method: eom
allow_single_cluster: false
prediction_data: false
branch_detection_data: false
match_reference_implementation: false
cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
kwargs: '`replace_me:dict[str, typing.Any]`'
bertopic_save_model_params:
serialization: safetensors
save_ctfidf: true
save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
root_dir: /root/.cache/sinapsis
save_path: '`replace_me:<class ''str''>`'
📚 Usage example
Below is an example YAML configuration for an albumentations worker
Config
agent:
name: my_test_agent
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: BERTopicFitModel
class_name: BERTopicFitModel
template_input: InputTemplate
attributes:
bertopic_model_params:
language: english
top_n_words: 10
n_gram_range: !!python/tuple
- 1
- 1
min_topic_size: 10
nr_topics: null
low_memory: false
calculate_probabilities: false
seed_topic_list: null
zeroshot_topic_list: null
zeroshot_min_similarity: 0.7
umap_model_params:
n_neighbors: 15
n_components: 2
metric: euclidean
metric_kwds: null
output_metric: euclidean
output_metric_kwds: null
n_epochs: null
learning_rate: 1.0
init: spectral
min_dist: 0.1
spread: 1.0
low_memory: true
n_jobs: -1
set_op_mix_ratio: 1.0
local_connectivity: 1.0
repulsion_strength: 1.0
negative_sample_rate: 5
transform_queue_size: 4.0
a: null
b: null
random_state: null
angular_rp_forest: false
target_n_neighbors: -1
target_metric: categorical
target_metric_kwds: null
target_weight: 0.5
transform_seed: 42
transform_mode: embedding
force_approximation_algorithm: false
verbose: false
tqdm_kwds: null
unique: false
densmap: false
dens_lambda: 2.0
dens_frac: 0.3
dens_var_shift: 0.1
output_dens: false
disconnection_distance: null
precomputed_knn: !!python/tuple
- null
- null
- null
hdbscan_model_params:
min_cluster_size: 5
min_samples: null
cluster_selection_epsilon: 0.0
cluster_selection_persistence: 0.0
max_cluster_size: 0
metric: euclidean
alpha: 1.0
p: null
algorithm: best
leaf_size: 40
approx_min_span_tree: true
gen_min_span_tree: false
core_dist_n_jobs: 4
cluster_selection_method: eom
allow_single_cluster: false
prediction_data: false
branch_detection_data: false
match_reference_implementation: false
cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
kwargs: '`replace_me:dict[str, typing.Any]`'
bertopic_save_model_params:
serialization: safetensors
save_ctfidf: true
save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
root_dir: /root/.cache/sinapsis
save_path: '`replace_me:<class ''str''>`'
To run the config, use the CLI:
sinapsis run name_of_config.yml
📙 Documentation
Documentation for this and other sinapsis packages is available on the sinapsis website
Tutorials for different projects within sinapsis are available at sinapsis tutorials page
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinapsis_bertopic-0.1.0.tar.gz.
File metadata
- Download URL: sinapsis_bertopic-0.1.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62532b407506e3476141dfd53b0394b516e023a659b5dbb176a7de17e6139a00
|
|
| MD5 |
e3dc536ae7aefe7d6d384b24c5275138
|
|
| BLAKE2b-256 |
49782046683fbf347751272e00760a361b79ccdbec6ca72b9cab217f460e20e4
|
File details
Details for the file sinapsis_bertopic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sinapsis_bertopic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b2baf49c08c9f8dab4972d5e991a92335a88789798e6fab1090e2c6e6287151
|
|
| MD5 |
9f9d17b72eb03c20018ed22c04a734aa
|
|
| BLAKE2b-256 |
cc0b962f4ee1300db595368c26a3e79ab86bd6d427225093aefc6fa23ed6c7d0
|