Skip to main content

Add your description here

Project description



Sinapsis BERTopic

Package for BERTopic

🐍 Installation 🚀 Features📙 Documentation 🔍 License

Sinapsis BERTopic provides BERTopic model integration for the Sinapsis framework for topic clusterization.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

This project is private. Make sure you have authorized credentials before proceeding.

Recommended Method (using .netrc):

To avoid baking credentials into URLs, configure your ~/.netrc file with your credentials:

Example with uv:

  uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

🚀 Features

Templates Supported

This package includes a publisher Template and a Worker agent

  • BERTopicFitModel: A template class for fitting BERTopic models and saving them to disk.

  • BERTopicPredict: Template for topic prediction using BERTopic models.

  • BERTopicVisualizeDocuments: BERTopic-based document visualization template for generating and exporting interactive topic model visualizations. This template extends BERTopicBase to provide functionality for encoding documents using sentence transformers, fitting a BERTopic model, and producing interactive visualizations of documents in a reduced dimensional space. The visualizations can be saved as HTML files and optionally exported as image arrays.

  • BERTopicVisualizeTopics: Template for BERTopic topic visualization.

    This template extends BERTopicPredict to generate and save interactive visualizations of topics discovered by a BERTopic model. It produces plotly-based visual representations of topic relationships and characteristics, and persists them as HTML files.

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OpenAI.

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for BERTopicFitModel use sinapsis info --example-template-config BERTopicFitModel to produce an example config like:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range: !!python/tuple
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn: !!python/tuple
      - null
      - null
      - null
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
      save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
    root_dir: /root/.cache/sinapsis
    save_path: '`replace_me:<class ''str''>`'

📚 Usage example

Below is an example YAML configuration for an albumentations worker

Config
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range: !!python/tuple
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn: !!python/tuple
      - null
      - null
      - null
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
      save_embedding_model: sentence-transformers/all-MiniLM-L6-v2
    root_dir: /root/.cache/sinapsis
    save_path: '`replace_me:<class ''str''>`'
This configuration defines an **agent** and a sequence of **templates** to fit a bertopic data based on incoming data.

To run the config, use the CLI:

sinapsis run name_of_config.yml

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_bertopic-0.1.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_bertopic-0.1.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_bertopic-0.1.0.tar.gz.

File metadata

  • Download URL: sinapsis_bertopic-0.1.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_bertopic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62532b407506e3476141dfd53b0394b516e023a659b5dbb176a7de17e6139a00
MD5 e3dc536ae7aefe7d6d384b24c5275138
BLAKE2b-256 49782046683fbf347751272e00760a361b79ccdbec6ca72b9cab217f460e20e4

See more details on using hashes here.

File details

Details for the file sinapsis_bertopic-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_bertopic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b2baf49c08c9f8dab4972d5e991a92335a88789798e6fab1090e2c6e6287151
MD5 9f9d17b72eb03c20018ed22c04a734aa
BLAKE2b-256 cc0b962f4ee1300db595368c26a3e79ab86bd6d427225093aefc6fa23ed6c7d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page