A PyTorch-inspired framework for LLM inference pipelines
Project description
plait
A PyTorch-inspired framework for building, executing, and optimizing LLM inference pipelines.
plait brings the familiar PyTorch programming model to compound AI systems. Define your LLM pipelines as modules with forward() methods, trace them into execution DAGs, and run them with automatic concurrency, backpressure, and resource management.
Why plait?
Most LLM applications are systems, not single calls: they chain multiple LLM invocations, pass structured data between steps, run verifiers, and need consistent handling for retries and rate limits. In plain Python, this complexity leaks everywhere.
plait moves that complexity into a shared runtime:
- Write normal module composition in
forward()- no async boilerplate - Trace it into a DAG - dependencies discovered automatically
- Execute with a scheduler - concurrent I/O, rate limiting, retries handled for you
- Optimize with feedback - backward passes propagate feedback to improve prompts
Framework Comparison
| Feature | plait | DSPy | LangGraph | Pydantic AI |
|---|---|---|---|---|
| Automatic parallelism | ✅ | ❌ | ❌ | ❌ |
| Implicit graph definition | ✅ | ✅ | ❌ | ❌ |
| Runtime optimization | ✅ | ❌ | ❌ | ❌ |
| Multi-model pipelines | ✅ | ✅ | ✅ | ✅ |
| Async-first execution | ✅ | ❌ | ✅ | ✅ |
| PyTorch-like API | ✅ | ❌ | ❌ | ❌ |
| Learnable parameters | ✅ | ✅ | ❌ | ❌ |
Benchmark: Extract-and-Compare Pipeline
Real-world performance on a fan-out workflow (2 parallel extractions + 1 comparison):
| Framework | Time | Memory | Notes |
|---|---|---|---|
| plait | 6.9s | 0.4 MB | Automatic parallel execution |
| Pydantic AI | 8.7s | 17.6 MB | Requires manual asyncio.gather() |
| LangGraph | 10.1s | 26.2 MB | Requires explicit Send() config |
| DSPy | 13.4s | 76.0 MB | Sequential execution only |
plait is up to 2x faster and uses up to 99% less memory than alternatives. See detailed comparisons →
Features
- PyTorch-like API:
Modulewithforward()andbackward()methods - Automatic DAG capture: Trace-based graph construction from eager-mode code
- Async execution: Maximum throughput with adaptive backpressure and rate limiting
- Resource management: Decouple module definitions from endpoint configuration
- LLM-based optimization: Backward passes that propagate feedback to update prompts
- Execution profiling: Chrome Trace Format export for performance visualization
Installation
# Install with uv (recommended)
uv add pyplait
# Or with pip
pip install pyplait
Note: The package is published as
pyplaiton PyPI, but you import it asplaitin Python.
Requirements: Python 3.13+
Quick Start
Define a pipeline as a module composition:
from plait import Module, LLMInference, Parameter
from plait.resources import OpenAIEndpointConfig, ResourceConfig
class SummarizeAndAnalyze(Module):
"""A two-stage pipeline: summarize, then analyze."""
def __init__(self):
super().__init__()
# Learnable instruction that can be optimized via backward passes
self.instructions = Parameter(
value="Be concise and highlight key insights.",
description="Controls the style of analysis output.",
)
self.summarizer = LLMInference(
alias="fast",
system_prompt="Summarize the input text concisely.",
)
self.analyzer = LLMInference(
alias="smart",
system_prompt=self.instructions,
)
def forward(self, text: str) -> str:
summary = self.summarizer(text)
return self.analyzer(f"Analyze this summary:\n{summary}")
# Configure OpenAI endpoints separately from module definition
resources = ResourceConfig(
endpoints={
"fast": OpenAIEndpointConfig(
model="gpt-4o-mini",
max_concurrent=20,
),
"smart": OpenAIEndpointConfig(
model="gpt-4o",
max_concurrent=5,
),
}
)
# Bind resources to the pipeline, then execute
pipeline = SummarizeAndAnalyze().bind(resources=resources)
result = await pipeline("Your input text...")
The pipeline is traced into a DAG, and the scheduler runs nodes concurrently where dependencies allow. Independent branches execute in parallel without manual asyncio.gather() calls.
Core Concepts
| PyTorch | plait | Purpose |
|---|---|---|
nn.Module |
Module |
Base class for operations |
nn.Parameter |
Parameter |
Learnable values (prompts, instructions) |
forward() |
forward() |
Define computation |
backward() |
backward() |
Propagate feedback |
torch.fx.Tracer |
Tracer |
Capture computation graph |
torch.optim.* |
Optimizer |
Update parameters |
Module
The base class for all operations. Compose modules by assigning them as attributes:
class DocumentProcessor(Module):
def __init__(self):
super().__init__()
self.extractor = LLMInference(alias="fast", system_prompt="Extract key facts.")
self.analyzer = MultiPerspectiveAnalysis() # Another Module
self.reporter = LLMInference(alias="smart", system_prompt="Write a report.")
def forward(self, document: str) -> str:
facts = self.extractor(document)
analyses = self.analyzer(facts)
return self.reporter(str(analyses))
LLMInference
The atomic unit for LLM API calls. Uses aliases that are bound to endpoints at execution time:
llm = LLMInference(
alias="reasoning", # Bound to endpoint config at runtime
system_prompt="You are a helpful assistant.",
temperature=0.7,
max_tokens=500,
)
Parameter
Learnable values (typically prompts or instructions) that can be optimized:
instructions = Parameter(
value="Be concise and accurate.",
description="System instructions for the assistant.",
requires_grad=True, # Enable optimization
)
ResourceConfig
Decouple module definitions from infrastructure. The same pipeline can run against different endpoints:
from plait.resources import (
OpenAIEndpointConfig,
AnthropicEndpointConfig,
EndpointConfig,
ResourceConfig,
)
# Development: use cheaper models
dev_resources = ResourceConfig(
endpoints={
"fast": OpenAIEndpointConfig(model="gpt-4o-mini", max_concurrent=5),
"smart": OpenAIEndpointConfig(model="gpt-4o-mini", max_concurrent=5),
}
)
# Production: use appropriate models with rate limiting
prod_resources = ResourceConfig(
endpoints={
"fast": OpenAIEndpointConfig(
model="gpt-4o-mini",
max_concurrent=50,
rate_limit=1000.0, # requests per minute
),
"smart": OpenAIEndpointConfig(
model="gpt-4o",
max_concurrent=20,
rate_limit=500.0,
),
}
)
# Self-hosted models with OpenAI-compatible API (vLLM, TGI, etc.)
local_resources = ResourceConfig(
endpoints={
"fast": EndpointConfig(
provider_api="vllm",
model="mistral-7b",
base_url="http://vllm.internal:8000/v1",
max_concurrent=50,
),
}
)
# Bind resources to a pipeline
pipeline = MyPipeline().bind(resources=dev_resources)
result = await pipeline("input text")
# Or use ExecutionSettings for shared resources across multiple pipelines
async with ExecutionSettings(resources=prod_resources):
result1 = await pipeline1("input")
result2 = await pipeline2("input")
Examples
The examples/ directory contains focused, runnable examples:
| Example | Description |
|---|---|
01_module.py |
Module, Parameter, and composition |
02_llm_pipeline.py |
LLMInference and pipeline patterns |
03_tracing.py |
DAG capture and visualization |
04_execution.py |
run(), bind(), ExecutionSettings, batch |
05_optimization.py |
Backward pass and prompt optimization |
Run an example:
python examples/01_module.py
Documentation
For detailed architecture and design documentation, see the design_docs/ directory:
- Architecture Overview - System design and component interactions
- Module - Core module system
- Tracing - How DAGs are captured from code
- Execution - Scheduler, state, and error handling
- Resources - Endpoint configuration and rate limiting
- Optimization - Feedback propagation and learning
Development
Setup
# Clone the repository
git clone https://github.com/eric-tramel/plait.git
cd plait
# Install dependencies with uv
uv sync
# Run all checks
make ci
Commands
make ci # Run all checks (lint, types, test)
make lint # Format and lint with ruff
make types # Type check with ty
make test # Run all pytest tests
make test-unit # Run unit tests only
See CLAUDE.md for detailed development guidelines.
License
Apache-2.0 License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyplait-2026.1.1.tar.gz.
File metadata
- Download URL: pyplait-2026.1.1.tar.gz
- Upload date:
- Size: 617.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae22dd7aca55af7a74aef77ab759e99b67359697d26d2a9f0439bc39ebb59bcc
|
|
| MD5 |
7565d13f8c2f7164574f44ea0aab94b1
|
|
| BLAKE2b-256 |
2b1c2690d5a5e812231887af7e6a57797ede657ca325864ed0b60d5eb8d0816e
|
Provenance
The following attestation bundles were made for pyplait-2026.1.1.tar.gz:
Publisher:
release.yml on eric-tramel/plait
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyplait-2026.1.1.tar.gz -
Subject digest:
ae22dd7aca55af7a74aef77ab759e99b67359697d26d2a9f0439bc39ebb59bcc - Sigstore transparency entry: 833624967
- Sigstore integration time:
-
Permalink:
eric-tramel/plait@b3e85f73f3daaab80c843aad40c0dbd1d00d6b6a -
Branch / Tag:
refs/tags/v2026.1.1 - Owner: https://github.com/eric-tramel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3e85f73f3daaab80c843aad40c0dbd1d00d6b6a -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyplait-2026.1.1-py3-none-any.whl.
File metadata
- Download URL: pyplait-2026.1.1-py3-none-any.whl
- Upload date:
- Size: 150.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e30bbdad29a0d7e69cf6b3e91f8bd727929a8fea92c9dc582ddd01949eecc908
|
|
| MD5 |
56ad0c0da69c13af5a690d5a665a2f44
|
|
| BLAKE2b-256 |
4687ea34d4b492afb58c293e9908c2fdaa9dcf612065d3a54d9be4389d6d29d0
|
Provenance
The following attestation bundles were made for pyplait-2026.1.1-py3-none-any.whl:
Publisher:
release.yml on eric-tramel/plait
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyplait-2026.1.1-py3-none-any.whl -
Subject digest:
e30bbdad29a0d7e69cf6b3e91f8bd727929a8fea92c9dc582ddd01949eecc908 - Sigstore transparency entry: 833624969
- Sigstore integration time:
-
Permalink:
eric-tramel/plait@b3e85f73f3daaab80c843aad40c0dbd1d00d6b6a -
Branch / Tag:
refs/tags/v2026.1.1 - Owner: https://github.com/eric-tramel
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3e85f73f3daaab80c843aad40c0dbd1d00d6b6a -
Trigger Event:
push
-
Statement type: