Outlines custom provider plugin for LangExtract

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RobinPicard

These details have not been verified by PyPI

Project description

LangExtract Outlines Plugin

A LangExtract provider plugin that integrates Outlines for structured text extraction using constrained generation.

Overview

This plugin enables you to use Outlines models with LangExtract for structured information extraction tasks. Outlines provides constrained generation capabilities that ensure model outputs conform to specific schemas, making it ideal for reliable structured extraction.

Installation

We recommend you use uv to install the package.

uv add langextract-outlines

The command above will automatically install langextract and outlines as they are dependencies of langextract-outlines. However, it will not install the optional dependencies required to run specific models with Outlines. If you want to use the Transformers model in Outlines for instance, install the associated optional dependencies:

uv add outlines[transformers]

Quick Start

To use the langextract-outlines plugin, you must provide OutlinesProvider instance as the value of the model parameter when using the langextract.extract function. As we are directly providing a model, no need to specify a model_id.

The arguments to initialize an OutlinesProvider instance are very similar to those you would use with the outlines.Generator constructor:

outlines_model: an instance of an outlines.models.Model, for instance Transformers or MLXLM
output_type: a list of Pydantic models that will be used to constrain the generation. More information on that in a dedicated section below
backend: the name of the backend that will be used in Outlines to constrain the generation (outlines_core by default)
**inference_kwargs: the keyword arguments that will be passed on to the underlying model by Outlines. Those correspond to the argument you would provide when calling a model in Outlines

For instance:

import langextract as lx
import outlines
import transformers
from pydantic import BaseModel, Field
from langextract_outlines import OutlinesProvider


# Define your extraction prompt and examples
prompt = "Extract characters and emotions from the text."
examples = [
    lx.data.ExampleData(
        text="Romeo gazed longingly at Juliet.",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="Romeo",
                attributes={"emotional_state": "longing"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="longingly",
                attributes={"feeling": "desire"}
            )
        ]
    )
]

# Define the associated output_type
class Character(BaseModel):
    emotional_state: str = Field(description="The emotional state of the character")

class Emotion(BaseModel):
    feeling: str = Field(description="The feeling of the emotion")

output_type = [Character, Emotion]

# Create the Outline model
model_id = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
model = transformers.AutoModelForCausalLM.from_pretrained(model_id)

# Create the Outlines provider
outlines_provider = OutlinesProvider(
    outlines_model=outlines.from_transformers(model, tokenizer),
    output_type=output_type,
    backend="outlines_core",
    temperature=0.5,
    repetition_penalty=1
)

# Run extraction
result = lx.extract(
    text="Juliet smiled brightly at the stars.",
    prompt_description=prompt,
    examples=examples,
    model=outlines_provider,
)

print(f"Extracted {len(result.extractions)} entities")

Output Type

The output type you provide must be compatible with the examples as the latter will be included in the prompt. In case of mismatch between the two, generation quality may be severely degraded.

The output type must be a list of Pydantic models, each of them corresponding to an Extraction type included in your examples. The name of the Pydantic model must be the name of the extraction_class in PascalCase. The fields of the model must correspond to the attributes of the extraction instance.

For instance:

import langextract as lx
from pydantic import BaseModel, Field

# Extraction included in the examples
lx.data.Extraction(
    extraction_class="character",
    extraction_text="Romeo",
    attributes={"emotional_state": "longing", "intensity": "medium"}
)

# Possible associated model included in the output_type
class Character(BaseModel):
    emotional_state: str = Field(
        description="The emotional state of the character",
        min_length=1,
        max_length=100,
    )
    intensity: Literal["low", "medium", "high"] = Field(
        description="The intensity of the emotion",
        default="medium",
    )

Inference Arguments

As explained above, all inference arguments such as temperature, max_new_tokens... must be provided as keyword arguments when initializing the OutlinesProvider. Inference arguments specified through other parts of the LangExtract interface will be ignored. Outlines does not standardize inference arguments across models, so you must make sure that the arguments you provide actually correspond to what the model you chose accepts.

License

Apache-2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

RobinPicard

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Sep 17, 2025

This version

0.1.2

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langextract_outlines-0.1.2.tar.gz (129.2 kB view details)

Uploaded Sep 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langextract_outlines-0.1.2-py3-none-any.whl (14.4 kB view details)

Uploaded Sep 17, 2025 Python 3

File details

Details for the file langextract_outlines-0.1.2.tar.gz.

File metadata

Download URL: langextract_outlines-0.1.2.tar.gz
Upload date: Sep 17, 2025
Size: 129.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.13

File hashes

Hashes for langextract_outlines-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`98038df5a029a2604c7019fee9b0f1ad21e845f5aa6055a0cbcca27a5cf8d1e4`
MD5	`76473f23f86b38e8fe20e5f76d823320`
BLAKE2b-256	`36e58babf573d6a7420ebdbfc7602384a177f76d3e032d9f48f79695817c3e41`

See more details on using hashes here.

File details

Details for the file langextract_outlines-0.1.2-py3-none-any.whl.

File metadata

Download URL: langextract_outlines-0.1.2-py3-none-any.whl
Upload date: Sep 17, 2025
Size: 14.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.13

File hashes

Hashes for langextract_outlines-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a2fa6fdd18b44aeab7892bac014cb467cfa683a0d674c1d1ca008f0ea56d1990`
MD5	`84bd3ea4cebdebc6960fe92462192c3a`
BLAKE2b-256	`46acafb700b13fe137e3f6dbab06439625bb1a4eb82f90061353e25cfa8158ef`

See more details on using hashes here.

langextract-outlines 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LangExtract Outlines Plugin

Overview

Installation

Quick Start

Output Type

Inference Arguments

License

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes