Skip to main content

A multi-agent LLM topic modeling library.

Project description

MALTopic: Multi-Agent LLM Topic Modeling Library

MALTopic is a powerful library designed for topic modeling using a multi-agent approach. It leverages the capabilities of large language models (LLMs) to enhance the analysis of survey responses by integrating structured and unstructured data.

MALTopic as a research paper was published in 2025 World AI IoT Congress. Links here.

Features

  • Multi-Agent Framework: Decomposes topic modeling into specialized tasks executed by individual LLM agents.
  • Data Enrichment: Enhances textual responses using structured and categorical survey data.
  • Latent Theme Extraction: Extracts meaningful topics from enriched responses.
  • Topic Deduplication: Refines and consolidates identified topics for better interpretability.
  • Automatic Batching: Handles large datasets by automatically splitting data into manageable batches when token limits are exceeded.
  • Intelligent Error Handling: Detects token limit errors and seamlessly switches to batching mode without user intervention.

Installation

To install the MALTopic library, you can use pip:

pip install maltopic

Usage

To use the MALTopic library, you need to initialize the main class with your API key and model name. You can choose between different LLMs such as OpenAI, Google Gemini (not supported yet), or Llama (not supported yet).

from maltopic import MALTopic

# Initialize the MALTopic class
client = MALTopic(
    api_key="your_api_key",
    default_model_name="gpt-4.1-nano",
    llm_type="openai",
)

enriched_df = client.enrich_free_text_with_structured_data(
        survey_context="context about survey, why, how of it...",
        free_text_column="column_1",
        structured_data_columns=["columns_2", "column_3"],
        df=df,
        examples=["free text response, category 1 -> free text response with additional context", "..."], # optional
    )

topics = client.generate_topics(
        topic_mining_context="context about what kind of topics you want to mine",
        df=enriched_df,
        enriched_column="column_1" + "_enriched", # MALTopic adds _enriched as the suffix.
    )

print(topics)

Automatic Batching for Large Datasets

MALTopic v1.1.0 introduces intelligent automatic batching to handle large datasets that may exceed LLM token limits. This feature works seamlessly in the background:

How It Works

  1. Automatic Detection: When generate_topics encounters a token limit error, it automatically detects this and switches to batching mode.

  2. Smart Splitting: The library uses tiktoken (OpenAI's token counting library) to intelligently split your data into optimally-sized batches based on actual token counts.

  3. Batch Processing: Each batch is processed independently, with progress tracking to keep you informed.

  4. Topic Consolidation: Topics from all batches are automatically merged and deduplicated to provide a clean, comprehensive result.

Key Benefits

  • No Code Changes Required: Existing code works without modification - batching happens automatically when needed.
  • Optimal Performance: Uses actual token counting for precise batch sizing, maximizing efficiency.
  • Robust Fallback: Even works without tiktoken by falling back to simple batch splitting.
  • Progress Visibility: Shows batch processing progress so you know what's happening.
  • Quality Preservation: Maintains topic quality through intelligent consolidation and deduplication.

Example Output

When batching is triggered, you'll see output like:

Token limit exceeded, splitting into batches...
Processing 3 batches...
Processing batches: 100%|██████████| 3/3 [00:45<00:00, 15.2s/it]
Batch 1/3: Generated 12 topics
Batch 2/3: Generated 8 topics  
Batch 3/3: Generated 10 topics
Consolidated 30 topics into 25 unique topics

This feature makes MALTopic suitable for processing large-scale survey datasets without worrying about token limitations.

Agents

  • Enrichment Agent: Enhances free-text responses using structured data.
  • Topic Modeling Agent: Extracts latent themes from enriched responses.
  • Deduplication Agent: Refines and consolidates the extracted topics. (not supported yet)

Changelog

v1.1.0 (June 2025)

  • NEW: Automatic batching for large datasets that exceed LLM token limits
  • NEW: Intelligent token counting using tiktoken for optimal batch sizing
  • NEW: Automatic error detection and seamless fallback to batching mode
  • NEW: Topic consolidation and deduplication across batches
  • NEW: Progress tracking for batch processing operations
  • IMPROVED: Enhanced error handling and user feedback
  • IMPROVED: Graceful degradation when tiktoken is not available

v1.0.0 (May 2025)

  • Multi-agent framework for topic modeling
  • Data enrichment capabilities
  • Basic topic extraction functionality

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Citation

If you use MALTopic in your research, please cite:

@software{Sharma2025maltopic,
  author = {Sharma, Yash},
  title = {MALTopic: A library for topic modeling},
  year = {2025},
  url = {https://github.com/yash91sharma/MALTopic-py}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maltopic-1.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maltopic-1.1.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file maltopic-1.1.0.tar.gz.

File metadata

  • Download URL: maltopic-1.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.10.14-linuxkit

File hashes

Hashes for maltopic-1.1.0.tar.gz
Algorithm Hash digest
SHA256 12840649520dc3ef19f9f231218dbe9e2e6b17a2dba80670387d645cc506a048
MD5 3bb9fa0ce55662c23a7aa4e54ba5a5c7
BLAKE2b-256 553430847380b1581257ace03b5825944cb61d1529f32fe02556d086a99b086a

See more details on using hashes here.

File details

Details for the file maltopic-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: maltopic-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.10.14-linuxkit

File hashes

Hashes for maltopic-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84f2be60475ecb571ebc8f8888b431c33c181989129193501732a47c3355ad6e
MD5 e6fe762290741f8617a06e06aded746d
BLAKE2b-256 6416f0fe1399dbf9f02793255d4109bc7edf489b797f52ded5437bbce034a2f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page