Skip to main content

Add your description here

Project description



Sinapsis BERTopic

Package for topic modeling using BERTopic.

🐍 Installation 🚀 Features📙 Documentation 🔍 License

Sinapsis BERTopic provides BERTopic model integration for the Sinapsis framework for topic clusterization.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

🚀 Features

Templates Supported

This package includes a publisher Template and a Worker agent

  • BERTopicFitModel: A template class for fitting BERTopic models and saving them to disk.
  • BERTopicPredict: Template for topic prediction using BERTopic models.

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OpenAI.

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for BERTopicPredict use sinapsis info --example-template-config BERTopicPredict to produce an example config like:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicPredict
  class_name: BERTopicPredict
  template_input: InputTemplate
  attributes:
    root_dir: /root/.cache/sinapsis
    export_visualization_to_image: true
    image_export_params:
      format: png
      width: null
      height: null
      scale: null
      validate_figure: true
    sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
    model_path: '`replace_me:<class ''str''>`'
    visualize_predictions: true
    visualize_topics: true
    historical_data_path: null
    prediction_viz_path: prediction_viz.html
    visualize_topics_params:
      topics: null
      top_n_topics: null
      use_ctfidf: false
      custom_labels: false
      title: <b>Intertopic Distance Map</b>
      figure_width: 650
      figure_height: 650
    save_topic_visualization_path: topics_visualization.html

📚 Usage example

Below is an example YAML configuration for BERTopic model fit.

Config
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    root_dir: /root/.cache/sinapsis
    export_visualization_to_image: true
    image_export_params:
      format: png
      width: null
      height: null
      scale: null
      validate_figure: true
    sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range:
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    save_documents_visualization_path: documents_visualization.html
    save_model_path: '`replace_me:<class ''str''>`'
    save_training_data: false
    save_training_data_path: training_data.pkl
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn:
      - null
      - null
      - null
    visualize_documents_params:
      topics: null
      sample: null
      hide_annotations: false
      hide_document_hover: false
      custom_labels: false
      title: <b>Documents and Topics</b>
      width: 1200
      height: 750

This configuration defines an agent and a sequence of templates to fit a bertopic data based on incoming data.

To run the config, use the CLI:

sinapsis run name_of_config.yml

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_bertopic-0.1.1.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_bertopic-0.1.1-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_bertopic-0.1.1.tar.gz.

File metadata

  • Download URL: sinapsis_bertopic-0.1.1.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_bertopic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 05bda3eb6a4b4f405e20a94166cca98ebe52da9e85c9196677e94939026479cf
MD5 fe848b1c4d96052871ca9e70cbfd4a40
BLAKE2b-256 8764372e2ceb00a1cb625c3304ed74f50c81084c22d4b31f5c3473d79fa7f403

See more details on using hashes here.

File details

Details for the file sinapsis_bertopic-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_bertopic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06104a1d889fd3fccfc874fd1a5597fd311f97e2c7fcd91493182de537c2b63d
MD5 6bea32a95837c1920b85da17ff037481
BLAKE2b-256 209a620b7ae3f647dc63029906665cb4156f0be572bcc745de0b2f9375be1fbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page