Skip to main content

Function calling Logit Sampler

Project description

Function Sampler

Open In Colab

Function Sampler is a powerful library that provides a novel approach to enforcing structured generation on language models. Unlike other libraries such as Langchain or Llama Index, which rely on prompts and hope that the model follows the prompt for parseable outputs, Function Sampler makes it probabilistically impossible for the language model to output invalid function calls.

By using Logit sampling and a Finite State Machine (FSM), Function Sampler guides the language model to generate function calls that adhere to a predefined schema. This eliminates the need for parsing the outputs and ensures that the generated function calls are always valid.

Features

  • Enforces the schema of function calls on the language model using Logit sampling
  • Activates sampling based on a specified delimiter token or string in the configuration
  • Supports top_p, top_k, temperature, and repetition_penalty sampling for function call values
  • Utilizes a Finite State Machine (FSM) to guide the sampling process
  • Provides a flexible configuration system using Pydantic models or keyword arguments
  • Includes a demo notebook showcasing various usage examples

Installation

Before installing the function-sampler library, you need to first ensure that the Rust programming language is installed on your system. Follow the installation instructions for your platform below, then continue to install the library from source.

Install Rust

Windows
  1. Download and run the Rust installer from rustup.rs.

  2. Follow the prompts to install Rust. This will also install cargo, Rust's package manager and build system.

  3. After installation, open a new command prompt and verify the installation by running:

    rustc --version
    
  4. Add Rust to your system PATH manually if it's not done automatically by the installer. Usually, Rust is installed under %USERPROFILE%\.cargo\bin.

  5. If Rust is installed correctly, you should see the version number, commit hash, and commit date.

macOS
  1. You can install Rust using the following command in your terminal:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Follow the instructions on the screen to complete the installation.

  3. After the installation is complete, restart your terminal and verify the installation by running:

    rustc --version
    
  4. Rust installs its binaries in ~/.cargo/bin. You may need to add this directory to your PATH using:

    echo 'export PATH="$HOME/.cargo/bin:$PATH"' >> ~/.bash_profile
    
  5. If Rust is installed correctly, you should see the version number, commit hash, and commit date.

Linux
  1. Use the following command in your terminal to install Rust:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Follow the on-screen instructions to complete the installation process.

  3. After completing the installation, source the Rust environment script:

    source $HOME/.cargo/env
    
  4. Verify the installation by running:

    rustc --version
    
  5. If Rust is installed correctly, you should see the version number, commit hash, and commit date.

Install function-sampler from Source

Note: Currently, until I can get the CI for PyPI sorted out, install from source is necessary.

git clone https://github.com/unaidedelf8777/function-sampler.git
cd function-sampler
python setup.py install

Usage

Here's a basic example of how to use the function-sampler library:

from function_sampler import ToolCallSampler
from transformers import AutoTokenizer, AutoModelForCausalLM

# Initialize the tokenizer and model
# if using a small GPU, or low vram:
# tokenizer = AutoTokenizer.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B")
model = AutoModelForCausalLM.from_pretrained("teknium/OpenHermes-2.5-Mistral-7B")

# Define the functions
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the users location.",
                },
            },
            "required": ["location", "format"],
        },
    }
]

# Configure the sampler
config = {
    "open_func_token": "<function>",
    "close_func_token": "</function>",
    "end_on_function_call": True,
    "temperature": 0.7,
    "top_p": 0.9,
}

# Create an instance of ToolCallSampler
sampler = ToolCallSampler(tokenizer, functions, config=config)


# Use the model for generation
# only need to tell it how to call the function if it is not explicitly trained for it.
input_text = "What is the weather today in paris? respond with the word '<function>' to call the weather API."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=200, logits_processor=[sampler])
generated_text = tokenizer.decode(output[0])
print(generated_text)
# <function>  {"name": "get_reservation", "arguments": {"restaurant_name": "Maggiano's", "reservation_time": 18:00:00, "party_size": 6, "contact_number": 1234567890}} </function><|im_end|>

In this example, we create an instance of the ToolCallSampler with the specified functions and configuration. We then attach the sampler to the model's logits_processor attribute. This ensures that the sampler is applied during the generation process.

Finally, we use the model to generate text based on the input prompt, which includes the opening function token. The generated text will contain a valid function call adhering to the predefined schema.

For more detailed usage and examples, please refer to the demo notebook provided with the library.

Configuration

The function-sampler library offers a flexible configuration system. You can customize the behavior of the sampler by providing a configuration dictionary, a ToolCallSamplerConfig instance, or keyword arguments when initializing the ToolCallSampler class.

The available configuration options include:

  • open_func_token: The opening delimiter token for a function call (default: "<function>")
  • close_func_token: The closing delimiter token for a function call (default: "</function>")
  • end_on_function_call: Whether to end the generation when a function call is encountered (default: False)
  • json_tokens: A custom token map for JSON tokens (default: built from the provided tokenizer)
  • temperature: The temperature value for sampling (default: None)
  • top_p: The top_p value for sampling (default: None)
  • top_k: The top_k value for sampling (default: None)
  • repetition_penalty: The repetition penalty value for sampling (default: None)

Contributing

Contributions to the function-sampler library are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl (362.8 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (344.7 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl (363.1 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (344.9 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

function_sampler-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl (361.0 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-cp312-cp312-manylinux_2_28_aarch64.whl (343.5 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

function_sampler-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl (362.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl (344.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

function_sampler-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl (362.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-cp310-cp310-manylinux_2_28_aarch64.whl (344.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

function_sampler-0.2.1-cp39-cp39-manylinux_2_28_x86_64.whl (363.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

function_sampler-0.2.1-cp39-cp39-manylinux_2_28_aarch64.whl (345.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

File details

Details for the file function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a71523cd40d0729dc41e94f68a4b64bc95c9d4f3e2c09443c5291e16683714aa
MD5 72a4e334f924a092f1c60af8424397d9
BLAKE2b-256 dc3d71096b13212355ecaa1f4be9ac8d44ea0d76a417559bf9194b0579c73755

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fccca1ec1b603db0b5553d756daa0cd6d8191d74be654b4ac2a03e6cea2ddb04
MD5 1d98fbf85fe2dc09f5747f81753f7265
BLAKE2b-256 f59be00f805ad929e54bfd809edd329445ea21c6ba7c94d0d4c4f8c3fe596f14

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5fb91f4f9bbc56aa082dc1367f43fdcf8e71948a4d8b42a51f4eec600ef4f60e
MD5 dc7022f5d06f6f8383ea1c7e0b39a7bc
BLAKE2b-256 e6815fa2a64a90d7776c9b8d8b442b39b10571a1e7cf24d496a9e4ce77291da4

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bbb76be75b27ecb972a7183e976485beff94b4a38fb5869c0b0f17d2688269e9
MD5 eedd2e69b206a443c3f5d10d8aecf511
BLAKE2b-256 6b1cc8e6c05d84faf3a06101a0c78f6cf3957734635d4471c0c673195be222a0

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c4b44901a9d665354c6c0a44fe24dd1bc4ec0e2833e77dcea1baff4b7c615430
MD5 ef87d07e15196a21f88830c96b1173db
BLAKE2b-256 7eb07f788934cdb745a846fec27f62f17c9545aec3bd169d39403e746910ec8d

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 655867781252cdc30e2c7079b8cd8420623af199a4587578411a5a95e3249964
MD5 711be34a804c79501781936b8b9e2d3c
BLAKE2b-256 ce9e0b65979941974d35eb853b9c5669bed0ffd530c70d5776f9dc7924049285

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 38bffe1f560b8c43b4380bab343b68fe23d2270eed41b04b9e98ecc9ec8cfd4c
MD5 1aabf258282a707fc3c7773a280085a8
BLAKE2b-256 c23acfe01d9fe89aaad678740a28ba5ea4b338f1076b9b30e9c0f78e5e53d906

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1684215d65d530cbec0015e0e42d9a99ddbc9e21279fe5fdf67c16ac3fa8a498
MD5 2d1da034658006c57f134e388fe37a75
BLAKE2b-256 5adefd13048b6e13ff5f36ef2beb2b0f76bc195feefdd69857602b3f87fe09e6

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e012a6ce2cbb98554c9c6a525c345fa0db950bf017ae329c72715fc51b03475b
MD5 e815af0a8587e7d6dde517ec9b8b983f
BLAKE2b-256 01ba6c0440c1924b898ef05b267687bbb8795a0ef8d3e833dc62967375a7b56e

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5d7b84a2a6854312cf176c3c4dcfd3fcc7c4f2a013becc202238cacb6cfac7a0
MD5 b2b3853797a1b3625966eb41a2995754
BLAKE2b-256 374fc00955ba734e2ab530ffc545182a546c06f431e9ed2dc3a726d4f250638d

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d7b5c8c1d93d125cb27aab6e68a6aab0745ae2250509cb801fcc1e705c33b5ea
MD5 f4440fea016e4a8513540681d2352ea0
BLAKE2b-256 263c8dc4191b8497d20ddf94f644f0d9d739fb739da9175fa5771bbb6d853347

See more details on using hashes here.

File details

Details for the file function_sampler-0.2.1-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for function_sampler-0.2.1-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3fbf32758f6a3e75f1ad28c8c5feecc3698d022423871f6f5225640903ee3f7c
MD5 6c245007ac35a2516c41f8616fa554dd
BLAKE2b-256 68fd230d531f14c3f40ebb5a1012769baec137e7f3a942b35f9b2b1fe7aad914

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page