Skip to main content

Function calling Logit Sampler

Project description

Function Sampler

Open In Colab

Function Sampler is a powerful library that provides a novel approach to enforcing structured generation on language models. Unlike other libraries such as Langchain or Llama Index, which rely on prompts and hope that the model follows the prompt for parseable outputs, Function Sampler makes it probabilistically impossible for the language model to output invalid function calls.

By using Logit sampling and a Finite State Machine (FSM), Function Sampler guides the language model to generate function calls that adhere to a predefined schema. This eliminates the need for parsing the outputs and ensures that the generated function calls are always valid.

Features

  • Enforces the schema of function calls on the language model using Logit sampling
  • Activates sampling based on a specified delimiter token or string in the configuration
  • Supports top_p, top_k, temperature, and repetition_penalty sampling for function call values
  • Utilizes a Finite State Machine (FSM) to guide the sampling process
  • Provides a flexible configuration system using Pydantic models or keyword arguments
  • Includes a demo notebook showcasing various usage examples

Installation

To install the function-sampler library, use the following command:

pip install function-sampler

Usage

Here's a basic example of how to use the function-sampler library:

from function_sampler import ToolCallSampler
from transformers import AutoTokenizer, AutoModelForCausalLM

# Initialize the tokenizer and model
# if using a small GPU, or low vram:
# tokenizer = AutoTokenizer.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B")
model = AutoModelForCausalLM.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B")

# Define the functions
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the users location.",
                },
            },
            "required": ["location", "format"],
        },
    }
]

# Configure the sampler
config = {
    "open_func_token": "<function>",
    "close_func_token": "</function>",
    "end_on_function_call": True,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Create an instance of ToolCallSampler
sampler = ToolCallSampler(tokenizer, functions, config=config)


# Use the model for generation
# only need to tell it how to call the function if it is not explicitly trained for it.
input_text = "What is the weather today in paris? respond with the word '<function>' to call the weather API."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=200, logits_processor=[sampler])
generated_text = tokenizer.decode(output[0])
print(generated_text)

In this example, we create an instance of the ToolCallSampler with the specified functions and configuration. We then attach the sampler to the model's logits_processor attribute. This ensures that the sampler is applied during the generation process.

Finally, we use the model to generate text based on the input prompt, which includes the opening function token. The generated text will contain a valid function call adhering to the predefined schema.

For more detailed usage and examples, please refer to the demo notebook provided with the library.

Configuration

The function-sampler library offers a flexible configuration system. You can customize the behavior of the sampler by providing a configuration dictionary, a ToolCallSamplerConfig instance, or keyword arguments when initializing the ToolCallSampler class.

The available configuration options include:

  • open_func_token: The opening delimiter token for a function call (default: "<function>")
  • close_func_token: The closing delimiter token for a function call (default: "</function>")
  • end_on_function_call: Whether to end the generation when a function call is encountered (default: False)
  • json_tokens: A custom token map for JSON tokens (default: built from the provided tokenizer)
  • temperature: The temperature value for sampling (default: None)
  • top_p: The top_p value for sampling (default: None)
  • top_k: The top_k value for sampling (default: None)
  • repetition_penalty: The repetition penalty value for sampling (default: None)

Contributing

Contributions to the function-sampler library are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

function_sampler-0.1.2.tar.gz (29.5 kB view details)

Uploaded Source

Built Distribution

function_sampler-0.1.2-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file function_sampler-0.1.2.tar.gz.

File metadata

  • Download URL: function_sampler-0.1.2.tar.gz
  • Upload date:
  • Size: 29.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for function_sampler-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8f00c533b512ea067f3c0db3d1b5ecfb4cc677cdc407ebd8de725f68dce86ff4
MD5 0fea5a64b8add6270394388ff24fec48
BLAKE2b-256 17f5629d2c9aecdd33a7ab7e0c5207ed72ebb87ff3f7fcd0ebfd456988fa1e71

See more details on using hashes here.

File details

Details for the file function_sampler-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: function_sampler-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for function_sampler-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4859ae2766458cd02690449c5d6312903776f5c853a876e68c8cc62c3a038c3b
MD5 ea1ea2136b0e7d6289b750b2e3884d0e
BLAKE2b-256 ce9dd0780663cd762823abd64c30984c3884992676fc10ab39e599bb802177ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page