Skip to main content

A helper package to work with LLMs

Project description

llm_helpers Package

Overview

The llm_helpers package provides utilities for interacting with language learning models (LLMs), including generating categories from text using recursrive calls to services like Azure's version of OpenAI. This package simplifies the process of sending requests to and interpreting responses from these models for large data which may not fit in the context of the target model.

Installation

To install llm_helpers, download the package and install it using pip:

pip install llm_helpers

Or, if the package is hosted in a repository:

pip install git+https://github.com/sonnylaskar/llm_helpers.git

Usage

To use the llm_helpers package in your project, simply import it and call the available functions. The primary function, generate_categories, allows you to generate category tags for a given text input using a specified language learning model. Currently only OpenAI and Azure OpenAI is supported.

Example

Recursive Category Generation from Text

When dealing with a large corpus of text from which we aim to generate categories, we often encounter the challenge that the text size exceeds the target model's context limit. One effective strategy to overcome this limitation is to segment the text into smaller portions that the model can process. The initial step involves chunking the text to fit within the model's context and generating categories for each segment. Subsequently, these categories are amalgamated and subjected to another round of category generation. This process may not suffice in a single iteration if the combined output still exceeds the model's context limit, necessitating further chunking and category generation. The generate_categories function facilitates this intricate process recursively, enabling streamlined category generation from extensive text data.

Parameters:

  • txt: The input text for category generation.
  • llm: The language model to use, choices include 'azure' or 'openai'.
  • endpoint: If using 'azure', specify the Azure endpoint.
  • key: The authentication key for the language model API.
  • api_version: Specifies the API version of the chosen model.

Optional Parameters (with defaults):

  • max_tokens=200: The maximum number of tokens to generate.
  • temperature=0.0: Controls the randomness in the output generation.
  • frequency_penalty=0.0: Adjusts the likelihood of repeating information.
  • presence_penalty=0.0: Influences the introduction of new concepts.
  • max_token_size=4092: Set to maximum token capacity of the target language model.
  • system_prompt="Generate the top categories into which the below text can be grouped, just print the categories, do not add any examples, put them to Others category if they don't fit in any category:": Customizable prompt that guides the model in generating relevant categories.

Note:

  • The system_prompt serves as a guideline for the model to ensure the categories generated align with the specified criteria.
  • Adjust the max_token_size according to the maximum token capacity of the target language model to optimize the chunking process.

The following script demonstrates how to use the llm_helpers package to generate categories from text stored in a file named sample_text.txt:

import llm_helpers

# Open the file in read mode
with open('sample_text.txt', 'r') as file:
    # Read the entire contents of the file into a string
    txt = file.read()

# Update the <> below with the correct values
categories = llm_helpers.generate_categories(txt, 
                                             llm = 'azure', 
                                             endpoint = "<azure_endpoint>", 
                                             key = "<azure_key>", 
                                             api_version="<api_version>", 
                                             deployment_name="<deployment_name>", 
                                             max_tokens=200, 
                                             temperature=0.0, 
                                             frequency_penalty=0.0, 
                                             presence_penalty=0.0, 
                                             max_token_size=4092, 
                                             system_prompt="Generate the top categories into which the below text can be grouped, just print the categories, do not add any examples, put them to Others category if they dont fit in any category: "
                                            )
print(categories)

Replace the placeholders (<>) with your actual Azure endpoint, key, API version, and deployment name to run the script.

License

This project is licensed under the Apache License, Version 2.0. For more details, see the LICENSE file in the root directory of this project.

Contributing

We welcome contributions to the llm_helpers package! If you'd like to contribute, please follow these steps:

  1. Fork the repository on GitHub.
  2. Make your changes in your forked repository.
  3. Submit a Pull Request back to the main repository.

We encourage you to discuss any substantial changes through a GitHub issue before you start working on your contribution. This allows us to provide feedback, suggest any necessary adjustments, and help you determine the best approach.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_helpers-0.1.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_helpers-0.1.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file llm_helpers-0.1.1.tar.gz.

File metadata

  • Download URL: llm_helpers-0.1.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for llm_helpers-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dbdb6171571611540c6f990d25d00d7851cb1c8bfc4a99374bd8d0e7347b04af
MD5 f4eda4d9cfb808f4250411b4990ecf3a
BLAKE2b-256 2ddd4807f1f8809df6b4c8f430eefb82ea19674bd17c479d28a5be056550219f

See more details on using hashes here.

File details

Details for the file llm_helpers-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_helpers-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for llm_helpers-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 281a35d9752e87528934b358d47371a2e506f1c12374b3907f25696ef22b1931
MD5 7ede2f4c68066d8c74e9caf4b5962098
BLAKE2b-256 68d436635789a09723eb793dbd842ed22710278325bab59ef03cada76a39e1fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page