Skip to main content

Faster, lazy backend for the `Outlines` library

Project description

Faster-Outlines

Supercharge your structured text generation with faster-outlines - a high-
performance Rust backend for the Outlines library.

Overview

faster_outlines is designed to significantly boost the performance of regex-guided text generation, particularly for LLM inference servers. It's an ideal solution for scenarios where regex patterns for guiding LLM generation are not known in advance.

Key features:

  • 🚀 Seamless one-line integration with existing Outlines projects
  • 🚀 All the features you already love about outlines
  • ⚡ Asynchronous FSM compilation for immediate start of LLM inference
  • 🏎️ Substantial performance improvements, especially for complex regex patterns ( like JSON )
  • 🔄 Continuous updates to improve speed!

Upcoming (in no particular order):

  • 🍴 vLLM fork using faster_outlines
  • 🤝 Official integration with vLLM's main repo (hopefully)
  • Redis as a caching backend, for large inference setups
  • 🦀 Rust API. ( currently started, but unfinished )

Why faster_outlines?

  1. Optimized for LLM Inference Servers: Ideal for scenarios where regex patterns are dynamic and not known beforehand.

  2. Asynchronous Processing: Unlike the standard Outlines library, faster_outlines allows you to start LLM inference immediately, without waiting for the entire FSM to compile.

  3. Significant Performance Boost: Especially noticeable with complex regex patterns and large state spaces.

  4. Seamless Integration: Works with your existing Outlines code with minimal changes (outlines v0.0.46, soon all versions).

Installation

[!WARNING] faster_outlines currently only supports linux based operating systems. You can try compiling on systems such as windows, but your better off using WSL2 If on a non linux system, you will need to build from source. Make sure you have Rust installed.

pip install faster_outlines

Quick Start

One line patching with outlines (v0.0.46) Integrating faster_outlines into your project is as simple as adding one line of code:
import outlines
from faster_outlines import patch

patch(outlines)

# Now use outlines as you normally would
# Your code here...

You can also pass save_to_sys_modules=True to the patch function, in which case all normal outlines imports will use the modified / patched module.

from faster_outlines import patch
import outlines
patch(outlines)

from outline.fsm.fsm import RegexFSM # Import as usual.

A more lengthy but full example:

import outlines
from faster_outlines import patch

patch(outlines)

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0", model_kwargs={"load_in_8bit": True})

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0")
print("Model loaded.")
generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")
print(character)
from faster_outlines.fsm import RegexGuide, TokenVocabulary
from faster_outlines.sampling import BaseLogitsProcessor
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B")

vocab = TokenVocab(
    tokenizer.get_vocab(),
    tokenizer.eos_token_id,
    set(tokenizer.all_special_tokens)
)

# Regex for an Email adress
regex = r"""[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"""

guide = RegexGuide(regex, vocab)

m = """<|im_start|>user\nWrite me a funny email adress.\n<|im_end|>\n<|im_start|>assistant\n"""

inputs = tokenizer.encode(m, return_tensors="pt")

logits_processor = BaseLogitsProcessor(guide)

print(
    model.generate(
        inputs.to("cuda"),
        max_new_tokens=100,
        logits_processors=[logits_processor],
        do_sample=True
    )
)

Performance Comparison

Performance Graph

faster-outlines's regex index compilation time is the time taken to fully compile the index, not the time until the index is usable for sampling. The time until the index is usable for sampling is normally not more than 1ms more than the time taken to compile the regex to a FSM using interegular.

The raw benchmark results are located in json at bench/benchmark_results.json, and the graph is made with bench/makePrettyGraph.js

Caching and Env vars

faster-outlines caches all generated FSMs in a Rust-based LRU Cache. The cache can be controlled using the following environment variables:

Variable Default Description
FASTER_OUTLINES_CACHE_SIZE 50 Maximum number of FSMs to cache
FASTER_OUTLINES_DISABLE_CACHE false Disable caching ("true"/"1"/"yes")

Docs

Most of the rust code is thoroughly documented in terms of data structure and methodology. The rust docs and the python binding code, aswell as the .pyi file for the compiled portion of the lib should be sufficient for most. If you have any questions which the comments and code don't aswer feel free to open an issue.

Contributing & Support

Contributions welcomed!

If you would like to support the further development and more speed improvements for faster_outlines, please consider supporting us on Github sponsors, or make a donation using the Buy-Me-A-Coffee link below!

Issues

If you have an issue with the lib, please, please open a github issue describing how to reproduce it, and we will be sure to work on fixing it.

Acknowledgments

  • This project builds upon the excellent work of the Outlines library.

Copyright

This work is dual licensed under apache-2.0 and MIT. find more info in the LICENSE file.

Citations:

@article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{\'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl (836.6 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (767.9 kB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_x86_64.whl (837.0 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_aarch64.whl (768.1 kB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_x86_64.whl (837.0 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_aarch64.whl (768.1 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_x86_64.whl (836.6 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_aarch64.whl (767.9 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_x86_64.whl (837.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_aarch64.whl (768.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

File details

Details for the file faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5780d2b58009c9e033f4928786c595f7de9955a142027fb0e49d349ab628a8eb
MD5 e20b9f5f217301256f854e2d06f24fb4
BLAKE2b-256 b02ba2823fad0f3087667efcf40cc3ee0b7ad602edc917ac5f22dbe28ccb9647

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bf36ab958e73db790fced2774de8479094de0531ce0575f6ec221e34e778128f
MD5 2fe5b31553d51c733e5915fcec4bde52
BLAKE2b-256 761e8f6b3ef183e4bbe7559c375bbc7f2b96c5c90af0bdb9fabf10a3dcfbeeb0

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 311d2840d731855be8a235bd44633f7b8931472fbd7b86031fd5d745a7e5c6ff
MD5 8c133669bfd2021ac7cc84fc92a3af21
BLAKE2b-256 3bcfa2cbd60cdc0f39d453459b6c256dda59aa1d5c14e9581f4e27581d7ca631

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7a48f8df4a63891c45f8db4f6833e147ffd188fd98f5c96b23cb3d9c519cd762
MD5 f6c3dd2aa944c86b7d6a12c77f331c09
BLAKE2b-256 3bd964063c27a07bcbec545db3beee39d5efc82f9154316f194558c55c007263

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e2f532c63000fb5a37d452c2c7e799bc6690c012d8cc0f28fa712d6f4194b038
MD5 23a3f3319a90dc47a5eb11b28ec49dd1
BLAKE2b-256 37d139c49172cc2a675cd23e1d56698002949545a8a19a445f78c924b0f41470

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8aaca4cb8b3e2039e9a49274bd145cc56a74ff76c3b753c689bd789fc59ccc11
MD5 57294bdf983aca1bb36c9a456519349f
BLAKE2b-256 aca1983e8c2f08ee40288e1fef69b5bea9d94758b83a82c22c3be9fb6c51683d

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 52a8440222b78fdbd26da4cfad620fa1e9f4d540f5216633872cb3740e00f6ae
MD5 24661875a4b4bf11322a9475b4213dff
BLAKE2b-256 9e98e62d67f6cee570067a3206531689a77ea3bac2306393e42117dd9230f027

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 53189701eb08089c98fc46cdda1373821cb6813f8e811afab5200073df24b12c
MD5 a1803aec3c026b2ddec667fdcb5f537f
BLAKE2b-256 4adad64c5680efa00f2ef1f8f5d812950fbf4bfd6d1715ba5bfb92aab9e3dd67

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 caf460b3a581578b227022108cd3463ce89c17936280de959f8cf09cea4a5d09
MD5 70c5610d232f8e31e4d427e90fba5b29
BLAKE2b-256 e40413256213258948f2ce6efb166d43f4fee3157f24a5db6586cc083ead3f8b

See more details on using hashes here.

File details

Details for the file faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for faster_outlines-2024.11.10-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ea9fae36225caa4678cb4d1d0c0e7140c2d373dd45671ab856c5acaa7244c298
MD5 37a7efdb2b1cc5d1b1af32b311d54690
BLAKE2b-256 f66a90f45e6f835f768b2cf992f9ac88e061b56b112465be77b0ca0df5cca263

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page