Unified serving layer for non-text foundation models
Project description
Sheaf
Unified serving layer for non-text foundation models.
vLLM solved inference for text LLMs by defining a standard compute contract and optimizing behind it. The same problem exists for every other class of foundation model — time series, tabular, molecular, geospatial, diffusion, audio — and nobody has solved it. Sheaf is that solution.
Each model type gets a typed request/response contract. Batching, caching, and scheduling are optimized per model type. Ray Serve is the substrate. Feast is a first-class input primitive.
In mathematics, a sheaf tracks locally-defined data that glues consistently across a space. Each model type defines its own local contract; Sheaf ensures they cohere into a unified serving layer.
Install
pip install sheaf-serve # core only
pip install "sheaf-serve[time-series]" # + Chronos2 / TimesFM / Moirai
pip install "sheaf-serve[tabular]" # + TabPFN
pip install "sheaf-serve[molecular]" # + ESM-3 (Python 3.12+)
pip install "sheaf-serve[genomics]" # + Nucleotide Transformer
pip install "sheaf-serve[small-molecule]" # + MolFormer
pip install "sheaf-serve[materials]" # + MACE-MP
pip install "sheaf-serve[audio]" # + Whisper / faster-whisper
pip install "sheaf-serve[audio-generation]" # + MusicGen
pip install "sheaf-serve[tts]" # + Bark
pip install "sheaf-serve[vision]" # + DINOv2 / OpenCLIP / SAM2 / Depth Anything / DETR
pip install "sheaf-serve[earth-observation]" # + Prithvi
pip install "sheaf-serve[weather]" # + GraphCast
pip install "sheaf-serve[feast]" # + Feast feature store integration
pip install "sheaf-serve[modal]" # + Modal serverless deployment
pip install "sheaf-serve[all]" # everything
Quickstart
Direct backend inference:
from sheaf.api.time_series import Frequency, OutputMode, TimeSeriesRequest
from sheaf.backends.chronos import Chronos2Backend
backend = Chronos2Backend(model_id="amazon/chronos-bolt-tiny", device_map="cpu")
backend.load()
req = TimeSeriesRequest(
model_name="chronos-bolt-tiny",
history=[312, 298, 275, 260, 255, 263, 285, 320,
368, 402, 421, 435, 442, 438, 430, 425],
horizon=12,
frequency=Frequency.HOURLY,
output_mode=OutputMode.QUANTILES,
quantile_levels=[0.1, 0.5, 0.9],
)
response = backend.predict(req)
# response.mean, response.quantiles
Ray Serve (production, autoscaling):
from sheaf import ModelServer
from sheaf.spec import ModelSpec, ResourceConfig
from sheaf.api.base import ModelType
server = ModelServer(models=[
ModelSpec(
name="chronos",
model_type=ModelType.TIME_SERIES,
backend="chronos2",
backend_kwargs={"model_id": "amazon/chronos-bolt-small"},
resources=ResourceConfig(num_gpus=1),
),
])
server.run() # POST /chronos/predict, GET /chronos/health
Feast feature store (resolve features at request time):
# ModelSpec wires Feast — no history needed in the request
spec = ModelSpec(
name="chronos",
model_type=ModelType.TIME_SERIES,
backend="chronos2",
feast_repo_path="/feast/feature_repo",
)
# Client sends feature_ref instead of raw history
{
"model_type": "time_series",
"model_name": "chronos",
"feature_ref": {
"feature_view": "asset_prices",
"feature_name": "close_history_30d",
"entity_key": "ticker",
"entity_value": "AAPL"
},
"horizon": 7,
"frequency": "1d"
}
Modal (serverless, zero-infra):
from sheaf import ModalServer
server = ModalServer(models=[spec], app_name="my-sheaf", gpu="A10G")
app = server.app # modal deploy my_server.py
See examples/ for time series comparison, tabular, audio, vision, and the Feast feature store quickstart.
Supported model types
| Type | Status | Backends |
|---|---|---|
| Time series | ✅ v0.1 | Chronos2, Chronos-Bolt, TimesFM, Moirai |
| Tabular | ✅ v0.1 | TabPFN v2 |
| Audio transcription | ✅ v0.3 | Whisper, faster-whisper |
| Audio generation | ✅ v0.3 | MusicGen |
| Text-to-speech | ✅ v0.3 | Bark |
| Vision embeddings | ✅ v0.3 | OpenCLIP, DINOv2 |
| Segmentation | ✅ v0.3 | SAM2 |
| Depth estimation | ✅ v0.3 | Depth Anything v2 |
| Object detection | ✅ v0.3 | DETR / RT-DETR |
| Protein / molecular | ✅ v0.3 | ESM-3 (Python 3.12+) |
| Genomics | ✅ v0.3 | Nucleotide Transformer |
| Small molecule | ✅ v0.3 | MolFormer-XL |
| Materials science | ✅ v0.3 | MACE-MP-0 |
| Earth observation | ✅ v0.3 | Prithvi (IBM/NASA) |
| Weather forecasting | ✅ v0.3 | GraphCast |
| Cross-modal embeddings | ✅ v0.3 | ImageBind (text, vision, audio, depth, thermal) |
| Feast feature store | ✅ v0.3 | Any Feast online store (SQLite, Redis, DynamoDB, …) |
| Modal serverless | ✅ v0.3 | ModalServer — zero-infra GPU deployment |
| Diffusion / image gen | 🔜 v0.4 | FLUX |
| Neural operators | 🔜 v0.4 | FNO, DeepONet |
| Video understanding | 🔜 v0.4 | VideoMAE, TimeSformer |
Roadmap to production
v0.2 — serving layer (complete)
- Ray Serve integration tested end-to-end
- Async
predict()handlers - HTTP API with proper request validation (422 on bad input)
- Health check and readiness probe endpoints
- Batching scheduler (BatchPolicy wired into
@serve.batchper deployment) - Error handling at the service boundary (backend exceptions → structured HTTP 500)
- Model hot-swap without restart (
ModelServer.update()) - Container-friendly auth for TabPFN v2 (
TABPFN_TOKENenv var)
v0.3 — model types + integrations (complete)
- ESM-3 protein embeddings
- Nucleotide Transformer genomics embeddings
- MolFormer-XL small molecule embeddings
- MACE-MP-0 materials (energy, forces, stress)
- Whisper / faster-whisper audio transcription
- MusicGen audio generation
- Bark text-to-speech
- OpenCLIP image/text embeddings
- DINOv2 image embeddings
- SAM2 segmentation
- Depth Anything v2 depth estimation
- DETR / RT-DETR object detection
- Prithvi earth observation embeddings
- GraphCast weather forecasting
- ImageBind cross-modal embeddings (text, vision, audio, depth, thermal)
- Feast feature store integration (
feature_refin requests,FeastResolver,feast_repo_pathonModelSpec) - Modal serverless deployment (
ModalServer— zero-infra alternative to Ray Serve)
v0.4 — generation + video (complete)
- FLUX diffusion / image generation
- VideoMAE / TimeSformer video understanding
v0.5 — serving quality + new modalities
Serving / infra:
- Streaming responses (token-by-token latency for diffusion and TTS)
- Request caching (
cache/layer — currently a stub) -
bucket_bybatching — group requests by horizon/length before@serve.batch - Prometheus metrics endpoint per deployment
- OpenTelemetry traces through the request path
- Structured logging with request IDs end-to-end
New model types:
- LiDAR / 3D point cloud (PointNet++, OpenShape)
- Pose estimation (ViTPose, MediaPipe)
- Optical flow (RAFT, UniMatch)
- Multimodal generation — text+image-conditioned (SDXL, CogVideoX)
- Speech synthesis with fine-grained control (StyleTTS2, Kokoro)
Architecture
┌─────────────────────────────────────────┐
│ API Layer │ typed contracts per model type
│ TimeSeriesRequest TabularRequest ... │
├─────────────────────────────────────────┤
│ Scheduling Layer │ model-type-aware batching
│ BatchPolicy RequestQueue │
├─────────────────────────────────────────┤
│ Backend Layer │ pluggable execution + Ray Serve
│ ModelBackend CacheManager Feast │
└─────────────────────────────────────────┘
Adding a new backend takes one class:
from sheaf.backends.base import ModelBackend
from sheaf.registry import register_backend
@register_backend("my-model")
class MyModelBackend(ModelBackend):
def load(self) -> None:
self._model = load_my_model()
def predict(self, request):
...
@property
def model_type(self):
return "time_series"
Contributing
Issues and PRs welcome. See CONTRIBUTING.md for development setup.
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sheaf_serve-0.4.0.tar.gz.
File metadata
- Download URL: sheaf_serve-0.4.0.tar.gz
- Upload date:
- Size: 683.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d621973cd95345de3221ad75e210dd0502a4dd391751cbdb3e58f372aff9a5a6
|
|
| MD5 |
ebb58261b1462128bccf494f645f7d4c
|
|
| BLAKE2b-256 |
720634601b58b6a2ab67676ed6ae7482cdce592cc79c542b42b443cea67f4d49
|
Provenance
The following attestation bundles were made for sheaf_serve-0.4.0.tar.gz:
Publisher:
publish.yml on korbonits/sheaf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_serve-0.4.0.tar.gz -
Subject digest:
d621973cd95345de3221ad75e210dd0502a4dd391751cbdb3e58f372aff9a5a6 - Sigstore transparency entry: 1322652821
- Sigstore integration time:
-
Permalink:
korbonits/sheaf@036e71b37f59cc9bd47e594dfbba257588f13d9a -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/korbonits
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@036e71b37f59cc9bd47e594dfbba257588f13d9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file sheaf_serve-0.4.0-py3-none-any.whl.
File metadata
- Download URL: sheaf_serve-0.4.0-py3-none-any.whl
- Upload date:
- Size: 96.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb483eab9818e15943f212cb248e612b56b24cb4953a4259101e3303f24464f2
|
|
| MD5 |
eec00e72bac491066b557df3f3c70d10
|
|
| BLAKE2b-256 |
e2bd56c5e5165d204f322ab53a489180cdded66538984ea730ea2ffee3291c96
|
Provenance
The following attestation bundles were made for sheaf_serve-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on korbonits/sheaf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_serve-0.4.0-py3-none-any.whl -
Subject digest:
bb483eab9818e15943f212cb248e612b56b24cb4953a4259101e3303f24464f2 - Sigstore transparency entry: 1322653007
- Sigstore integration time:
-
Permalink:
korbonits/sheaf@036e71b37f59cc9bd47e594dfbba257588f13d9a -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/korbonits
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@036e71b37f59cc9bd47e594dfbba257588f13d9a -
Trigger Event:
push
-
Statement type: