Skip to main content

Faster, lazy backend for the `Outlines` library

Project description

Faster-Outlines

Supercharge your structured text generation with faster-outlines - a high-
performance Rust backend for the Outlines library.

Overview

faster_outlines is designed to significantly boost the performance of regex-guided text generation, particularly for LLM inference servers. It's an ideal solution for scenarios where regex patterns for guiding LLM generation are not known in advance.

Key features:

  • 🚀 Seamless one-line integration with existing Outlines projects
  • 🚀 All the features you already love about outlines
  • ⚡ Asynchronous FSM compilation for immediate start of LLM inference
  • 🏎️ Substantial performance improvements, especially for complex regex patterns ( like JSON )
  • 🔄 Continuous updates to improve speed!

Upcoming (in no particular order):

  • 🍴 vLLM fork using faster_outlines
  • 🤝 Official integration with vLLM's main repo (hopefully)
  • Redis as a caching backend, for large inference setups
  • 🦀 Rust API. ( currently started, but unfinished )

Why faster_outlines?

  1. Optimized for LLM Inference Servers: Ideal for scenarios where regex patterns are dynamic and not known beforehand.

  2. Asynchronous Processing: Unlike the standard Outlines library, faster_outlines allows you to start LLM inference immediately, without waiting for the entire FSM to compile.

  3. Significant Performance Boost: Especially noticeable with complex regex patterns and large state spaces.

  4. Seamless Integration: Works with your existing Outlines code with minimal changes (outlines v0.0.46, soon all versions).

Installation

[!WARNING] faster_outlines currently only supports linux based operating systems. You can try compiling on systems such as windows, but your better off using WSL2 If on a non linux system, you will need to build from source. Make sure you have Rust installed.

pip install faster_outlines

Quick Start

One line patching with outlines (v0.0.46) Integrating faster_outlines into your project is as simple as adding one line of code:
import outlines
from faster_outlines import patch

patch(outlines)

# Now use outlines as you normally would
# Your code here...

You can also pass save_to_sys_modules=True to the patch function, in which case all normal outlines imports will use the modified / patched module.

from faster_outlines import patch
import outlines
patch(outlines)

from outline.fsm.fsm import RegexFSM # Import as usual.

A more lengthy but full example:

import outlines
from faster_outlines import patch

patch(outlines)

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0", model_kwargs={"load_in_8bit": True})

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0")
print("Model loaded.")
generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")
print(character)
from faster_outlines.fsm import RegexGuide, TokenVocabulary
from faster_outlines.sampling import BaseLogitsProcessor
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")

vocab = TokenVocab(
    tokenizer.get_vocab(),
    tokenizer.eos_token_id,
    set(tokenizer.all_special_tokens)
)

# Regex for an Email adress
regex = r"""[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"""

guide = RegexGuide(regex, vocab)

m = """<|im_start|>user\nWrite me a funny email adress.\n<|im_end|>\n<|im_start|>assistant\n"""

inputs = tokenizer.encode(m, return_tensors="pt")

logits_processor = BaseLogitsProcessor(guide)

print(
    model.generate(
        inputs.to("cuda"),
        max_new_tokens=100,
        logits_processors=[logits_processor],
        do_sample=True
    )
)

Performance Comparison

Performance Graph

faster-outlines's regex index compilation time is the time taken to fully compile the index, not the time until the index is usable for sampling. The time until the index is usable for sampling is normally not more than 1ms more than the time taken to compile the regex to a FSM using interegular.

The raw benchmark results are located in json at bench/benchmark_results.json, and the graph is made with bench/makePrettyGraph.js

Caching and Env vars

faster-outlines caches all generated FSMs in a Rust-based LRU Cache. The cache can be controlled using the following environment variables:

Variable Default Description
FASTER_OUTLINES_CACHE_SIZE 50 Maximum number of FSMs to cache
FASTER_OUTLINES_DISABLE_CACHE false Disable caching ("true"/"1"/"yes")

Docs

Most of the rust code is thoroughly documented in terms of data structure and methodology. The rust docs and the python binding code, aswell as the .pyi file for the compiled portion of the lib should be sufficient for most. If you have any questions which the comments and code don't aswer feel free to open an issue.

Contributing & Support

Contributions welcomed!

If you would like to support the further development and more speed improvements for faster_outlines, please consider supporting us on Github sponsors, or make a donation using the Buy-Me-A-Coffee link below!

Issues

If you have an issue with the lib, please, please open a github issue describing how to reproduce it, and we will be sure to work on fixing it.

Acknowledgments

  • This project builds upon the excellent work of the Outlines library.

Copyright

This work is dual licensed under apache-2.0 and MIT. find more info in the LICENSE file.

Citations:

@article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{\'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl (837.0 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (767.6 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl (837.3 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (768.0 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_x86_64.whl (834.4 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_aarch64.whl (766.3 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_x86_64.whl (834.4 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_aarch64.whl (766.3 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_x86_64.whl (834.8 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_aarch64.whl (766.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_x86_64.whl (837.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_aarch64.whl (768.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_x86_64.whl (837.9 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_aarch64.whl (768.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ ARM64

File details

Details for the file faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 65c472bc439d71f0229af077e9e6eeffef2dcbb4f820cd05d92f30f28455a4d6
MD5 f1bdac88dc95e9b3ddf7eee6adc76838
BLAKE2b-256 cdf33c01e94981726a58576a32474ccba0410b5124607f4905e5d4a380c70a84

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 885f9fb2c45de5fcf073210c82a5602d9393b4a0fe802fd0bdd851eaf53dcb85
MD5 7151dc0229300b00e22c33d7b389d61c
BLAKE2b-256 7d2985337f41b977d0503c451dfd3a91d6dc6edc96b4c9623e30e6df1080ac6d

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00f51d23c0016d2c32bb216f87e476a516951f09a4bca0b93ddc364dc8daeca1
MD5 b083aa81f8c25c80b5092485088ab85a
BLAKE2b-256 819b88a3e2d37e2e8e977cb96f7797680757201927000188fb34d6a9fa49dc40

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f27ae53d94642cbe4e06fd5c85994fad0c1859737859583a08124bdbaf6628a8
MD5 bf13881815ab29b10537473b7af8f955
BLAKE2b-256 27755fdde315d8b3bdf5653b456bab8f04686431256a7d55ab7f5c2f2c78efbd

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 41977119500bcadc4dcd67621df7a73926e60e733bb5b22eb2c08c8bcfcfbd58
MD5 8445c6917d5c5c5d5bc79e4347150883
BLAKE2b-256 654dfd6da57214753b9d8d51ea572fb3e5da7180a28d9f04595132192e15a8df

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2bfb3d03f16ab33f834442da54cae1774db872f6b5571ac563a84fb66827d25c
MD5 4103aa131b03bc3135f6cd76998f8798
BLAKE2b-256 db7db772d0d3c76c197ec1d2a304796c2988aa9b6ed11035315cf52aaff012e5

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9bfc935eec2e8725e0f000912be7bc852bc32206ef218df2f7ccd95333a9c39a
MD5 e3405735391f3f64ae7240d0e1cda353
BLAKE2b-256 81350bea7112a0af8334d07f634aa3565b59b70ecf4ee69938928bc87067b877

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cf0ae6f6c88a25a8dc71fb4f51cc57c25725ba37f985e049ae27630f69c9dc23
MD5 13c7168efb028a6ca29bcc26b6283691
BLAKE2b-256 a4752aef2b74781b036eb12ef8f47ffa3a34c33daa5938df63b784753f5d528c

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e8574e4c8fb271b3da0afa6792d69bc619a4df69b80712c86533800f8abaef12
MD5 81c29c9502c61f0eac1b6fc0cde1a901
BLAKE2b-256 c7e7f44ab41a74e3faf65cb1e9722d116d5ca10ec3c7a1ebd36141e6fe1cc5ba

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c457e447934f0718cb5190bb030b1f6f9b49e9f54f3e7a6106bc7aa16a93279d
MD5 44ec9f51f28706af3175f1466803e38b
BLAKE2b-256 b4f418a90fbb60b58785a3bd5fabc3f3aa22192fc25662ee2e1a36c545496f9e

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b89373b107cba915af2caf29623b47724b321d2639b443a098234c6b9317cc19
MD5 bb470188139b0bd2d44a7812ba8e00fd
BLAKE2b-256 edc1294a3c8e88ec64a507560772a42b03bedd2a74003b58c7400deb0d1f5868

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d5229846e36801f682302c5003af6d106eb70ce6c6bd5c0ec3170fdf590e33a9
MD5 34a94ae8a3e769732741678628098e8e
BLAKE2b-256 e52679bce9eebb734f644bbcad1cd1507872726fa46b12fd7b0adf424abdb4c4

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 34f396519eda2d0635e8c46a8032415d7450c11332fd82217f6292589b514401
MD5 0ec73eaedb093cd4918d784d5eebc894
BLAKE2b-256 94b138ac745c1331cac12703d5dde77880af7e18e78058b9e54ea10ee4e86fef

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.14-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2ebbf45addc5047456673b7f50f73db884c71fb9f0d73064f4bdabfdb5dc472f
MD5 b3a2f0ed997c8a16e30c2e93f190923a
BLAKE2b-256 050531a62d7e9946a9d6bfe6826fffcf216a0757c6ad1bd81068e243c538d5ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page