Faster backend for the `Outlines` library.
Project description
performance Rust backend for the Outlines library.
Overview
faster_outlines is designed to significantly boost the performance of regex-guided text generation, particularly for LLM inference servers. It's an ideal solution for scenarios where regex patterns for guiding LLM generation are not known in advance.
Key features:
- 🚀 Seamless one-line integration with existing Outlines projects
- 🚀 All the features you already love about outlines
- ⚡ Asynchronous FSM compilation for immediate start of LLM inference
- 🏎️ Substantial performance improvements, especially for complex regex patterns ( like JSON )
- 🔄 Continuous updates to improve speed!
Upcoming:
- 🍴 vLLM fork using faster_outlines
- 🤝 Official integration with vLLM's main repo (hopefully)
- Redis as a caching backend, for large inference setups
Why faster_outlines?
-
Optimized for LLM Inference Servers: Ideal for scenarios where regex patterns are dynamic and not known beforehand.
-
Asynchronous Processing: Unlike the standard Outlines library, faster_outlines allows you to start LLM inference immediately, without waiting for the entire FSM to compile.
-
Significant Performance Boost: Especially noticeable with complex regex patterns and large state spaces.
-
Seamless Integration: Works with your existing Outlines code with minimal changes.
Installation
pip install faster_outlines
Quick Start
Integrating faster_outlines into your project is as simple as adding one line of code:
import outlines
from faster_outlines import patch
patch(outlines)
# Now use outlines as you normally would
# Your code here...
You can also pass save_to_sys_modules=True
to the patch function, in which case all normal outlines imports will use the modified / patched module.
from faster_outlines import patch
import outlines
patch(outlines)
from outlines_core.fsm.fsm import RegexFSM # Import as usual.
Example
import outlines
from faster_outlines import patch
patch(outlines)
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0", model_kwargs={"load_in_8bit": True})
schema = '''{
"title": "Character",
"type": "object",
"properties": {
"name": {
"title": "Name",
"maxLength": 10,
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"armor": {"$ref": "#/definitions/Armor"},
"weapon": {"$ref": "#/definitions/Weapon"},
"strength": {
"title": "Strength",
"type": "integer"
}
},
"required": ["name", "age", "armor", "weapon", "strength"],
"definitions": {
"Armor": {
"title": "Armor",
"description": "An enumeration.",
"enum": ["leather", "chainmail", "plate"],
"type": "string"
},
"Weapon": {
"title": "Weapon",
"description": "An enumeration.",
"enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
"type": "string"
}
}
}'''
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2", device="cuda:0", model_kwargs={"load_in_8bit": True})
print("Model loaded.")
generator = outlines.generate.json(model, schema)
character = generator("Give me a character description")
print(character)
Performance Comparison
However, if you would like to manually control the number of threads used, you can do so via environment variable:
export FASTER_OUTLINES_NUM_THREADS=<num-threads>
Please note that setting the number of threads to a number higher than the number of cores / logical threads on your machine WILL DETERIORATE PERFORMANCE, not improve it.
If you would like to test performance at different thread counts on your machine, you can use the script at tests/test_fsm_comp_time.py
, by first running the script using the automatic thread count ( or what ever you are currently using ), and then the number of threads you are thinking of using.
Compatibility
faster_outlines
is designed to be fully compatible with the Outlines API, however, currently only full support for version 0.0.46 ( latest as of 7/13/24 ) can be garunteed.
Contributing & Support
We welcome contributions!
If you would like to support the further development and more speed improvements for faster_outlines, please consider supporting us on Github sponsors, or make a donation using the Buy-Me-A-Coffee link below!
Issues
If you have an issue with the lib, please, please open a github issue describing how to reproduce it, and we will be sure to work on fixing it.
Acknowledgments
- This project builds upon the excellent work of the Outlines library.
Citations:
@article{willard2023efficient,
title={Efficient Guided Generation for LLMs},
author={Willard, Brandon T and Louf, R{\'e}mi},
journal={arXiv preprint arXiv:2307.09702},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for faster_outlines-9.18.2024-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0205cded522c3c73f2a46b2d57be9132e073a58f025941111f5d5b04bc5dfe34 |
|
MD5 | 5f24f9b65483671fa6ed13b6feb3af30 |
|
BLAKE2b-256 | 3319c9bb8310edf031f6550c229ab0d3e56ba6d92ebd2f70f2af0f6da8953d2a |
Hashes for faster_outlines-9.18.2024-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffd389355e8668491d9da8796478ad23b1daa3d91211a71d987ccf4eaea4fd19 |
|
MD5 | a67b946056a26893ce6167f70a6ede47 |
|
BLAKE2b-256 | 2b59bc1c31630727e5791490afc9407359169a1d150d225fc94982f7df881e61 |
Hashes for faster_outlines-9.18.2024-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f63079128dea4f058c4f9429109cd5aac6d4907f85025256516b147a940fbf04 |
|
MD5 | a805f17d436b1a25ab87325c3ae13c84 |
|
BLAKE2b-256 | 70ddc75a0c3d5a7c717e2d0521bdf8a1bda85b98fe00aac03fe02f91123f3c5d |
Hashes for faster_outlines-9.18.2024-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf8cc2fb42a362f2f253b249bf9798c96fd2ad7a02f1f46cfae1afa2fdeb483c |
|
MD5 | f280d3ce6750fe1068f59db97728e76b |
|
BLAKE2b-256 | d98d9813395b34fc0866c0d2df25f0a8c6f98064726d82c001994654d695a232 |
Hashes for faster_outlines-9.18.2024-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f2fdd5341a23f164b26dcd91683ca44f7e5463840986548498bffa649c63ad2 |
|
MD5 | b54724e38294b8f72645d6c8305f87d3 |
|
BLAKE2b-256 | 9d130a7708258b5a847812c82587228ee90978aac184e06d7b584777789b97db |
Hashes for faster_outlines-9.18.2024-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b0dc153e6f7ae97fdc1099a095af54a972503ceacc73b92f38e3c54ee9c4b9a |
|
MD5 | 244cb2b650747cb531872189a824c1d4 |
|
BLAKE2b-256 | 7982cbc668bb02d050ecad42677f098713dd70193f0de2b7d8599cbec67efb86 |
Hashes for faster_outlines-9.18.2024-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 949ff3728fa85b49bb85586d9b70335ed494c56333e789096fc346778da2d381 |
|
MD5 | 4e21592540e73c79730252b356da0332 |
|
BLAKE2b-256 | cf72022e620e354096b70a2286cfcb341f0aee60ad77d46c0ba86f120e5f6f7c |
Hashes for faster_outlines-9.18.2024-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 976ccda72e501b7ae0c2f65608062f61788e7ff77d5d48548cd3253ee15b35fd |
|
MD5 | 0b23a17ea423a606f0c6845ab3945368 |
|
BLAKE2b-256 | 077639a75941a33deeac2a423b21036838bf12ab9923106f6f7e8c206451ade8 |
Hashes for faster_outlines-9.18.2024-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4628385eea6a9c05de7817e3aa95d62cf882a072ac3c6b88fd68e253c852d250 |
|
MD5 | a0ff01b61e874e7647c61c4e0966e355 |
|
BLAKE2b-256 | d20ab2a3dd37da5cf20477722aeedf918b0a6a404e653277860a5467ea97e5f1 |
Hashes for faster_outlines-9.18.2024-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 254127094450d52b7e6302c0bccdb46ee0a651d9b0661854333c762f5830c067 |
|
MD5 | a7998865509e10c090c2b0747b9b9431 |
|
BLAKE2b-256 | 9dcd060b44424569cede0d66af441c959b766b1a5e1aae2ef4b9c639561909ea |
Hashes for faster_outlines-9.18.2024-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9af4f2946463956c390d7ae818356a56bbedc5f34a4725106fccb9a40bf7b98 |
|
MD5 | 9573e486982e491a9bca56f8d880b9f5 |
|
BLAKE2b-256 | 10e239ab8542bb7643e802385e7f7c4ef3bdd55d92ba1730eeecd03e774033a8 |
Hashes for faster_outlines-9.18.2024-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 877906e4976a7cf766fd50273cd7328034d1ea92fd5b319b00135b899a0200c8 |
|
MD5 | a0158bee530ec9a71b38758d3a6d486f |
|
BLAKE2b-256 | 69cd20da1f07ec0ea8333a0be4b493c2ef8258300c2fc4430622490ce421572b |
Hashes for faster_outlines-9.18.2024-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f33000084b7caa67551d0a9f399cc25af025757a10686cae960259ed93f2f454 |
|
MD5 | 755ec347f4a486f25b2c523fe60d96f7 |
|
BLAKE2b-256 | 994f515b251afaf302ee97be9ac03a92bda1253a7c04d7f2089bd931cb1e846c |
Hashes for faster_outlines-9.18.2024-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccf19cc051ed4d93baeecee4e781bdeaf399fe38271bd12d63bd5d7df1d9ecd2 |
|
MD5 | f44b07f56cdcd92ff9acb28aab278e5f |
|
BLAKE2b-256 | 5b5cf64ac5e5e554cf73bc6b3179779a0b3606e5d1055a41e80e87816b5baa9f |