Probabilistic Generative Model Programming

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Outlines 〰️

Outlines Logo

Robust (guided) text generation.

Install • Guided generation • Prompting primitives • Examples • Stay tuned

Features

🖍️Simple and powerful prompting primitives based on the Jinja templating engine
🚄 Guided generation, including multiple choice, type constraints and dynamic stopping
⚡ Fast regex-guided generation
🔥 Fast JSON generation following a JSON schema or a Pydantic model
📝 Grammar-guided generation
🐍 Interleave completions with loops, conditionals, and custom Python functions
💾 Caching of generations

Available models

Transformers
AutoGPTQ
AutoAWQ (requires pip install autoawq)
OpenAI API
Mamba

Outlines 〰 has new releases and features coming every week. Make sure to ⭐ star and 👀 watch this repository, follow @dottxtai to stay up to date!

⚠️ We're hiring someone to work full-time on Outlines ⚠️

Installation

Outlines is available on PyPi:

pip install outlines

The dependencies needed to use models are not installed by default. You will need to run:

pip install openai to be able to use OpenAI models.
pip install transformers datasets to be able to use Hugging Face transformers models.

Philosophy

Outlines 〰 is a library for neural text generation. You can think of it as a more flexible replacement for the generate method in the transformers library.

Outlines 〰 helps developers guide text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.

Outlines 〰 provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc.

Outlines 〰 is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries.

Outlines 〰 is compatible with all models. It only interfaces with models via the next-token logits. It can be used with API-based models as well.

Guided generation

The first step towards reliability of systems that include large language models is to ensure that there is a well-defined interface between their output and user-defined code. Outlines provides ways to control the generation of language models to make their output more predictable.

Multiple choices

You can reduce the completion to a choice between multiple possibilities:

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""
answer = outlines.generate.choice(model, ["Positive", "Negative"])(prompt)

Type constraint

You can instruct the model to only return integers or floats:

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")

prompt = "1+1="
answer = outlines.generate.format(model, int)(prompt)

prompt = "sqrt(2)="
answer = outlines.generate.format(model, float)(prompt)

Efficient regex-guided generation

Outlines also comes with fast regex-guided generation. In fact, the choice, integer and float functions above all use regex-guided generation under the hood:

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")

prompt = "Is 1+1=2? "
unguided = outlines.generate.text(model, max_tokens=30)(prompt)
guided = outlines.generate.regex(model, r"\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)", max_tokens=30)(
    prompt
)

print(unguided)
# Is 1+1=2?
#
# This is probably the most perplexing question.
# As I said in one of my articles describing how
# I call 2 and 1, there isn't

print(guided)
# Is 1+1=2? Always

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")

prompt = "What is the IP address of the Google DNS servers? "
unguided = outlines.generate.text(model, max_tokens=30)(prompt)
guided = outlines.generate.regex(
    model,
    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)",
    max_tokens=30,
)(prompt)

print(unguided)
# What is the IP address of the Google DNS servers?
#
# Passive DNS servers are at DNS servers that are private.
# In other words, both IP servers are private. The database
# does not contain Chelsea Manning

print(guided)
# What is the IP address of the Google DNS servers?
# 2.2.6.1

Unlike other libraries, regex-guided generation in Outlines is almost as fast as non-guided generation.

Efficient JSON generation following a Pydantic model

Outlines 〰 allows to guide the generation process so the output is guaranteed to follow a JSON schema or Pydantic model:

from enum import Enum
from pydantic import BaseModel, constr

import outlines
import torch


class Weapon(str, Enum):
    sword = "sword"
    axe = "axe"
    mace = "mace"
    spear = "spear"
    bow = "bow"
    crossbow = "crossbow"


class Armor(str, Enum):
    leather = "leather"
    chainmail = "chainmail"
    plate = "plate"


class Character(BaseModel):
    name: constr(max_length=10)
    age: int
    armor: Armor
    weapon: Weapon
    strength: int


model = outlines.models.transformers("mistralai/Mistral-7B-v0.1", device="cuda")

# Construct guided sequence generator
generator = outlines.generate.json(model, Character, max_tokens=100)

# Draw a sample
rng = torch.Generator(device="cuda")
rng.manual_seed(789001)

sequence = generator("Give me a character description", rng=rng)
print(sequence)
# {
#   "name": "clerame",
#   "age": 7,
#   "armor": "plate",
#   "weapon": "mace",
#   "strength": 4171
# }

sequence = generator("Give me an interesting character description", rng=rng)
print(sequence)
# {
#   "name": "piggyback",
#   "age": 23,
#   "armor": "chainmail",
#   "weapon": "sword",
#   "strength": 0
# }

The method works with union types, optional types, arrays, nested schemas, etc. Some field constraints are not supported yet, but everything else should work.

Efficient JSON generation following a JSON Schema

Sometimes you just want to be able to pass a JSON Schema instead of a Pydantic model. We've got you covered:

import outlines
import torch

schema = '''{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}'''

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1", device="cuda")
generator = outlines.generate.json(model, schema)
sequence = generator("Give me a character description")

Using context-free grammars to guide generation

Formal grammars rule the world, and Outlines makes them rule LLMs too. You can pass any context-free grammar in the EBNF format and Outlines will generate an output that is valid to this grammar:

import outlines

arithmetic_grammar = """
    ?start: sum

    ?sum: product
        | sum "+" product   -> add
        | sum "-" product   -> sub

    ?product: atom
        | product "*" atom  -> mul
        | product "/" atom  -> div

    ?atom: NUMBER           -> number
         | "-" atom         -> neg
         | "(" sum ")"

    %import common.NUMBER
    %import common.WS_INLINE

    %ignore WS_INLINE
"""

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1", device="cuda")
generator = outlines.generate.cfg(model, arithmetic_grammar)
sequence = generator("Write a formula that returns 5 using only additions and subtractions.")

# It looks like Mistral is not very good at arithmetics :)

print(sequence)
# 1+3-2-4+5-7+8-6+9-6+4-2+3+5-1+1

This was a very simple grammar, and you can use outlines.generate.cfg to generate syntactically valid Python, SQL, and much more than this. Any kind of structured text, really. All you have to do is search for "X EBNF grammar" on the web, and take a look at the Outlines Grammars repository.

Open functions

Outlines can infer the structure of the output from the signature of a function. The result is a dictionary, and can be passed directly to the function using the usual dictionary expansion syntax **:

import outlines


def add(a: int, b: int):
    return a + b

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")
generator = outlines.generate.json(model, add)
result = generator("Return two integers named a and b respectively. a is odd and b even.")

print(add(**result))
# 3

A great advantage of passing functions directly to specify the structure is that the structure of the LLM will change with the function's definition. No need to change the code at several places!

Prompting

Writing prompts by concatenating strings in pure Python quickly becomes cumbersome: the prompt building logic gets entangled with the rest of the program, and the structure of the rendered prompt is obfuscated.Outlines makes it easier to write and manage prompts by encapsulating templates inside "template functions".

These functions make it possible to neatly separate the prompt logic from the general program logic; they can be imported from other modules and libraries.

Template functions require no superfluous abstraction, they use the Jinja2 templating engine to help build complex prompts in a concise manner:

import outlines

examples = [
    ("The food was digusting", "Negative"),
    ("We had a fantastic night", "Positive"),
    ("Recommended", "Positive"),
    ("The waiter was rude", "Negative")
]

@outlines.prompt
def labelling(to_label, examples):
    """You are a sentiment-labelling assistant.

    {% for example in examples %}
    {{ example[0] }} // {{ example[1] }}
    {% endfor %}
    {{ to_label }} //
    """

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")
prompt = labelling("Just awesome", examples)
answer = outlines.generate.text(model, max_tokens=100)(prompt)

Tools

We can teach language models to call external functions to get additional informations or perform tasks, by encoding the functions' description in the prompt. To avoid duplicating information between the function definition and the description passed to the prompt, we define custom Jinja filters that can extract the function's name, description, signature and source:

from typing import Callable, List
import outlines


def google_search(query: str):
    """Google Search"""
    pass


def wikipedia_search(query: str):
    """Wikipedia Search"""
    pass


@outlines.prompt
def my_commands(tools: List[Callable]):
    """AVAILABLE COMMANDS:

    {% for tool in tools %}
    TOOL
    {{ tool | name }}, {{ tool | description }}, args: {{ tool | signature }}
    {{ tool | source }}
    {% endfor %}
    """


prompt = my_commands([google_search, wikipedia_search])

Response models

We can instruct models to return their output in a pre-defined format, often JSON. To avoid duplicating information between the function definition and the description passed to the prompt we define a custom Jinja filter that can extract the expected response's schema:

from pydantic import BaseModel, Field
import outlines


class Joke(BaseModel):
    joke: str = Field(description="The joke")
    explanation: str = Field(
        description="The explanation of why the joke is funny"
    )


@outlines.prompt
def joke_ppt(response_model):
    """Tell a joke and explain why the joke is funny.

    RESPONSE FORMAT:
    {{ response_model | schema }}
    """


joke_ppt(Joke)

# Tell a joke and explain why the joke is funny.
#
# RESPONSE FORMAT:
# {
#    "joke": "The joke"
#    "explanation": "The explanation of why the joke is funny"
#  }

With these prompting primitives Outlines makes building agents like AutoGPT, BabyAGI, ViperGPT or Transformers Agent easier by removing boilerplate prompting code.

Contributing

What contributions?

We currently only accept bug fixes and documentation contributions. If you have a feature request, please start a new discussion. The issue tracker is only intended for actionable items.

How to contribute?

Run pip install -e .[test] or conda env create -f environment.yml. To build the documentation you will also need to run pip install -r requirements-doc.txt.

Before pushing your code to repository please run pre-commit run --all-files and pytest to make sure that the code is formatted correctly and that the tests pass.

Do not hesitate to open a draft PR before your contribution is ready, especially if you have questions and/or need feedback.

Examples

Cite Outlines

@article{willard2023efficient,
  title={Efficient Guided Generation for LLMs},
  author={Willard, Brandon T and Louf, R{\'e}mi},
  journal={arXiv preprint arXiv:2307.09702},
  year={2023}
}

License

Outlines is open-source and licensed under the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.1.dev0 pre-release

Mar 22, 2023

0.0.41

Apr 30, 2024

0.0.40

Apr 21, 2024

0.0.39

Apr 17, 2024

0.0.37

Mar 25, 2024

0.0.36

Mar 12, 2024

0.0.35

Mar 12, 2024

0.0.34

Feb 28, 2024

0.0.33

Feb 22, 2024

0.0.32

Feb 16, 2024

0.0.31

Feb 14, 2024

0.0.30

Feb 13, 2024

0.0.29

Feb 12, 2024

0.0.28

Feb 10, 2024

0.0.27

Feb 6, 2024

0.0.26

Feb 5, 2024

0.0.25

Jan 26, 2024

0.0.24

Jan 14, 2024

0.0.23

Jan 11, 2024

0.0.22

Jan 8, 2024

0.0.21

Dec 29, 2023

This version

0.0.19

Dec 21, 2023

0.0.18

Dec 19, 2023

0.0.17

Dec 19, 2023

0.0.16

Dec 14, 2023

0.0.15

Dec 13, 2023

0.0.14

Dec 8, 2023

0.0.13

Nov 30, 2023

0.0.12

Nov 24, 2023

0.0.11

Nov 15, 2023

0.0.9

Oct 22, 2023

0.0.8

Aug 14, 2023

0.0.7

Jul 24, 2023

0.0.6

Jul 19, 2023

0.0.4

Jun 6, 2023

0.0.3

Jun 6, 2023

0.0.2

May 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlines-0.0.19.tar.gz (1.1 MB view hashes)

Uploaded Dec 21, 2023 Source

Built Distribution

outlines-0.0.19-py3-none-any.whl (62.0 kB view hashes)

Uploaded Dec 21, 2023 Python 3

Hashes for outlines-0.0.19.tar.gz

Hashes for outlines-0.0.19.tar.gz
Algorithm	Hash digest
SHA256	`d2377d94282e2db27a90bab3d91b231b014c50c29cdac02ae3fc9b25f72bdd59`
MD5	`dea3af92b5598486c6f0a6588d31e77a`
BLAKE2b-256	`b13ce4047784f7d07c72e3c21959c1dcc205d0fad259b63399dd2376c5d46711`

Hashes for outlines-0.0.19-py3-none-any.whl

Hashes for outlines-0.0.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`915d1eef271ae42a1c27fd5a87880865d236980ae080aaec9bf5bbdb5d588104`
MD5	`b65550ff3521743cdfd51b58d692a59d`
BLAKE2b-256	`67e43876aec4bd2d0ea535fb9c8dad426fd2f8744e496fad1970d7315673c023`

outlines 0.0.19

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Outlines 〰️

Features

Available models

Installation

Philosophy

Guided generation

Multiple choices

Type constraint

Efficient regex-guided generation

Efficient JSON generation following a Pydantic model

Efficient JSON generation following a JSON Schema

Using context-free grammars to guide generation

Open functions

Prompting

Tools

Response models

Contributing

What contributions?

How to contribute?

Examples

Cite Outlines

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution