Skip to main content

Package for topic modeling using BERTopic, including templates for fitting models and making predictions.

Project description



Sinapsis BERTopic

Package for topic modeling using BERTopic.

🐍 Installation 🚀 Features📙 Documentation 🔍 License

Sinapsis BERTopic provides BERTopic model integration for the Sinapsis framework for topic clusterization.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-bertopic --extra-index-url https://pypi.sinapsis.tech

🚀 Features

Templates Supported

This package includes a publisher Template and a Worker agent

  • BERTopicFitModel: A template class for fitting BERTopic models and saving them to disk.
  • BERTopicFitModelFromDataFrame: A template class for fitting BERTopic models using data from a DataFrame and saving the model to disk.
  • BERTopicPredict: Template for topic prediction using BERTopic models.

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis OpenAI.

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for BERTopicPredict use sinapsis info --example-template-config BERTopicPredict to produce an example config like:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicPredict
  class_name: BERTopicPredict
  template_input: InputTemplate
  attributes:
    root_dir: /root/.cache/sinapsis
    export_visualization_to_image: true
    image_export_params:
      format: png
      width: null
      height: null
      scale: null
      validate_figure: true
    sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
    model_path: '`replace_me:<class ''str''>`'
    visualize_predictions: true
    visualize_topics: true
    historical_data_path: null
    prediction_viz_path: prediction_viz.html
    visualize_topics_params:
      topics: null
      top_n_topics: null
      use_ctfidf: false
      custom_labels: false
      title: <b>Intertopic Distance Map</b>
      figure_width: 650
      figure_height: 650
    save_topic_visualization_path: topics_visualization.html

📚 Usage example

Below is an example YAML configuration for BERTopic model fit.

Config
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: BERTopicFitModel
  class_name: BERTopicFitModel
  template_input: InputTemplate
  attributes:
    root_dir: /root/.cache/sinapsis
    export_visualization_to_image: true
    image_export_params:
      format: png
      width: null
      height: null
      scale: null
      validate_figure: true
    sentence_model_name: sentence-transformers/all-MiniLM-L6-v2
    bertopic_model_params:
      language: english
      top_n_words: 10
      n_gram_range:
      - 1
      - 1
      min_topic_size: 10
      nr_topics: null
      low_memory: false
      calculate_probabilities: false
      seed_topic_list: null
      zeroshot_topic_list: null
      zeroshot_min_similarity: 0.7
    bertopic_save_model_params:
      serialization: safetensors
      save_ctfidf: true
    hdbscan_model_params:
      min_cluster_size: 5
      min_samples: null
      cluster_selection_epsilon: 0.0
      cluster_selection_persistence: 0.0
      max_cluster_size: 0
      metric: euclidean
      alpha: 1.0
      p: null
      algorithm: best
      leaf_size: 40
      approx_min_span_tree: true
      gen_min_span_tree: false
      core_dist_n_jobs: 4
      cluster_selection_method: eom
      allow_single_cluster: false
      prediction_data: false
      branch_detection_data: false
      match_reference_implementation: false
      cluster_selection_epsilon_max: '`replace_me:<class ''float''>`'
      kwargs: '`replace_me:dict[str, typing.Any]`'
    save_documents_visualization_path: documents_visualization.html
    save_model_path: '`replace_me:<class ''str''>`'
    save_training_data: false
    save_training_data_path: training_data.pkl
    umap_model_params:
      n_neighbors: 15
      n_components: 2
      metric: euclidean
      metric_kwds: null
      output_metric: euclidean
      output_metric_kwds: null
      n_epochs: null
      learning_rate: 1.0
      init: spectral
      min_dist: 0.1
      spread: 1.0
      low_memory: true
      n_jobs: -1
      set_op_mix_ratio: 1.0
      local_connectivity: 1.0
      repulsion_strength: 1.0
      negative_sample_rate: 5
      transform_queue_size: 4.0
      a: null
      b: null
      random_state: null
      angular_rp_forest: false
      target_n_neighbors: -1
      target_metric: categorical
      target_metric_kwds: null
      target_weight: 0.5
      transform_seed: 42
      transform_mode: embedding
      force_approximation_algorithm: false
      verbose: false
      tqdm_kwds: null
      unique: false
      densmap: false
      dens_lambda: 2.0
      dens_frac: 0.3
      dens_var_shift: 0.1
      output_dens: false
      disconnection_distance: null
      precomputed_knn:
      - null
      - null
      - null
    visualize_documents_params:
      topics: null
      sample: null
      hide_annotations: false
      hide_document_hover: false
      custom_labels: false
      title: <b>Documents and Topics</b>
      width: 1200
      height: 750

This configuration defines an agent and a sequence of templates to fit a bertopic data based on incoming data.

To run the config, use the CLI:

sinapsis run name_of_config.yml

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_bertopic-0.1.2.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_bertopic-0.1.2-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_bertopic-0.1.2.tar.gz.

File metadata

  • Download URL: sinapsis_bertopic-0.1.2.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.17

File hashes

Hashes for sinapsis_bertopic-0.1.2.tar.gz
Algorithm Hash digest
SHA256 758a780f464ddcad943827da6ba29708a83a9bdcffcf204278803275a9f13790
MD5 9d61f4df79c3e8d54403993636838d5d
BLAKE2b-256 20917616fbffdafe3cee582efc1890bec7175012615f8c5c6572f7afac130e3a

See more details on using hashes here.

File details

Details for the file sinapsis_bertopic-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_bertopic-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 192b8fb383df823fc1d27b145cf50a9295bbd19e1f5cc5df4b959eebe7beb5ab
MD5 2d20a23e5ab15b8fd794a1ba505203ee
BLAKE2b-256 ad509b8c072a990c995c71d970be5ab3d0bb72955c9de84ae071e3bba909a5f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page