Unified serving layer for non-text foundation models

These details have not been verified by PyPI

Project description

Sheaf

Unified serving layer for non-text foundation models.

vLLM solved inference for text LLMs by defining a standard compute contract and optimizing behind it. The same problem exists for every other class of foundation model — time series, tabular, molecular, geospatial, diffusion, audio — and nobody has solved it. Sheaf is that solution.

Each model type gets a typed request/response contract. Batching, caching, and scheduling are optimized per model type. Ray Serve is the substrate. Feast is a first-class input primitive.

In mathematics, a sheaf tracks locally-defined data that glues consistently across a space. Each model type defines its own local contract; Sheaf ensures they cohere into a unified serving layer.

Install

pip install sheaf-serve                           # core only
pip install "sheaf-serve[time-series]"            # + Chronos2 / TimesFM / Moirai
pip install "sheaf-serve[tabular]"                # + TabPFN
pip install "sheaf-serve[molecular]"              # + ESM-3  (Python 3.12+)
pip install "sheaf-serve[genomics]"               # + Nucleotide Transformer
pip install "sheaf-serve[small-molecule]"         # + MolFormer
pip install "sheaf-serve[materials]"              # + MACE-MP
pip install "sheaf-serve[audio]"                  # + Whisper / faster-whisper
pip install "sheaf-serve[audio-generation]"       # + MusicGen
pip install "sheaf-serve[tts]"                    # + Bark
pip install "sheaf-serve[vision]"                 # + DINOv2 / OpenCLIP / SAM2 / Depth Anything / DETR
pip install "sheaf-serve[earth-observation]"      # + Prithvi
pip install "sheaf-serve[weather]"                # + GraphCast
pip install "sheaf-serve[feast]"                  # + Feast feature store integration
pip install "sheaf-serve[modal]"                  # + Modal serverless deployment
pip install "sheaf-serve[batch]"                  # + offline batch inference (Ray Data)
pip install "sheaf-serve[all]"                    # everything

Quickstart

Direct backend inference:

from sheaf.api.time_series import Frequency, OutputMode, TimeSeriesRequest
from sheaf.backends.chronos import Chronos2Backend

backend = Chronos2Backend(model_id="amazon/chronos-bolt-tiny", device_map="cpu")
backend.load()

req = TimeSeriesRequest(
    model_name="chronos-bolt-tiny",
    history=[312, 298, 275, 260, 255, 263, 285, 320,
             368, 402, 421, 435, 442, 438, 430, 425],
    horizon=12,
    frequency=Frequency.HOURLY,
    output_mode=OutputMode.QUANTILES,
    quantile_levels=[0.1, 0.5, 0.9],
)

response = backend.predict(req)
# response.mean, response.quantiles

Ray Serve (production, autoscaling):

from sheaf import ModelServer
from sheaf.spec import ModelSpec, ResourceConfig
from sheaf.api.base import ModelType

server = ModelServer(models=[
    ModelSpec(
        name="chronos",
        model_type=ModelType.TIME_SERIES,
        backend="chronos2",
        backend_kwargs={"model_id": "amazon/chronos-bolt-small"},
        resources=ResourceConfig(num_gpus=1),
    ),
])
server.run()  # POST /chronos/predict, GET /chronos/health

Feast feature store (resolve features at request time):

# ModelSpec wires Feast — no history needed in the request
spec = ModelSpec(
    name="chronos",
    model_type=ModelType.TIME_SERIES,
    backend="chronos2",
    feast_repo_path="/feast/feature_repo",
)

# Client sends feature_ref instead of raw history
{
    "model_type": "time_series",
    "model_name": "chronos",
    "feature_ref": {
        "feature_view": "asset_prices",
        "feature_name": "close_history_30d",
        "entity_key": "ticker",
        "entity_value": "AAPL"
    },
    "horizon": 7,
    "frequency": "1d"
}

Modal (serverless, zero-infra):

from sheaf import ModalServer

server = ModalServer(models=[spec], app_name="my-sheaf", gpu="A10G")
app = server.app  # modal deploy my_server.py

See examples/ for time series comparison, tabular, audio, vision, and the Feast feature store quickstart.

Supported model types

Type	Status	Backends
Time series	✅ v0.1	Chronos2, Chronos-Bolt, TimesFM, Moirai
Tabular	✅ v0.1	TabPFN v2
Audio transcription	✅ v0.3	Whisper, faster-whisper
Audio generation	✅ v0.3	MusicGen
Text-to-speech	✅ v0.3	Bark
Vision embeddings	✅ v0.3	OpenCLIP, DINOv2
Segmentation	✅ v0.3	SAM2
Depth estimation	✅ v0.3	Depth Anything v2
Object detection	✅ v0.3	DETR / RT-DETR
Protein / molecular	✅ v0.3	ESM-3 (Python 3.12+)
Genomics	✅ v0.3	Nucleotide Transformer
Small molecule	✅ v0.3	MolFormer-XL
Materials science	✅ v0.3	MACE-MP-0
Earth observation	✅ v0.3	Prithvi (IBM/NASA)
Weather forecasting	✅ v0.3	GraphCast
Cross-modal embeddings	✅ v0.3	ImageBind (text, vision, audio, depth, thermal)
Feast feature store	✅ v0.3	Any Feast online store (SQLite, Redis, DynamoDB, …)
Modal serverless	✅ v0.3	`ModalServer` — zero-infra GPU deployment
Diffusion / image gen	🔜 v0.4	FLUX
Neural operators	🔜 v0.4	FNO, DeepONet
Video understanding	🔜 v0.4	VideoMAE, TimeSformer

Roadmap to production

v0.2 — serving layer (complete)

Ray Serve integration tested end-to-end
Async predict() handlers
HTTP API with proper request validation (422 on bad input)
Health check and readiness probe endpoints
Batching scheduler (BatchPolicy wired into @serve.batch per deployment)
Error handling at the service boundary (backend exceptions → structured HTTP 500)
Model hot-swap without restart (ModelServer.update())
Container-friendly auth for TabPFN v2 (TABPFN_TOKEN env var)

v0.3 — model types + integrations (complete)

ESM-3 protein embeddings
Nucleotide Transformer genomics embeddings
MolFormer-XL small molecule embeddings
MACE-MP-0 materials (energy, forces, stress)
Whisper / faster-whisper audio transcription
MusicGen audio generation
Bark text-to-speech
OpenCLIP image/text embeddings
DINOv2 image embeddings
SAM2 segmentation
Depth Anything v2 depth estimation
DETR / RT-DETR object detection
Prithvi earth observation embeddings
GraphCast weather forecasting
ImageBind cross-modal embeddings (text, vision, audio, depth, thermal)
Feast feature store integration (feature_ref in requests, FeastResolver, feast_repo_path on ModelSpec)
Modal serverless deployment (ModalServer — zero-infra alternative to Ray Serve)

v0.4 — generation + video (complete)

FLUX diffusion / image generation
VideoMAE / TimeSformer video understanding

v0.5 — observability + new modalities

Ops / DX:

PyPI publish (v0.4.0)
Prometheus metrics endpoint per deployment
Structured logging with request IDs end-to-end
OpenTelemetry traces through the request path

Serving / infra:

Streaming responses (POST /{name}/stream → SSE; FLUX emits per-step progress events)
Request caching (CacheConfig on ModelSpec — in-process LRU, optional TTL)
bucket_by batching — group requests by field value before @serve.batch

New model types:

LiDAR / 3D point cloud (PointNet — pure-PyTorch, no torch-geometric; embed + ModelNet40 classify; install with pip install 'sheaf-serve[lidar]')
Pose estimation (ViTPose — COCO 17-keypoint skeleton, optional person bboxes; install with pip install 'sheaf-serve[pose]')
Optical flow (RAFT — raft_large/raft_small via torchvision; (H, W, 2) float32 flow field; install with pip install 'sheaf-serve[optical-flow]')
Multimodal generation — text+image-conditioned (SDXL img2img + inpainting; install with pip install 'sheaf-serve[multimodal-generation]')
Speech synthesis with fine-grained control (Kokoro — voice + speed per request; install with pip install 'sheaf-serve[kokoro]')

v0.6 — batch inference + async jobs

The goal: cover every shape of production inference, not just synchronous HTTP.

Offline / batch:

BatchRunner — same backend, same typed contract, offline batch mode; Ray Data map_batches substrate, stateless tasks with a worker-local backend cache so load() fires once per worker (not once per batch); install with pip install 'sheaf-serve[batch]'
BatchSpec — mirrors ModelSpec for backend selection; JsonlSource/JsonlSink in v1; new sources/sinks (S3, Parquet, Delta) slot in as additional BatchSource/BatchSink subclasses without changing the runner API
Resumable checkpointing across process restarts (#12)
Actor-pool execution mode for warm loads on expensive backends (FLUX, GraphCast, SDXL) — opt-in via BatchSpec.compute="actors" + num_actors=N; load() runs once per actor at __init__ and persists for the actor's lifetime (#13)

Async job queue:

SheafWorker — queue-consumer pattern for long-running inference; v1 ships Redis Streams + consumer groups (horizontal scaling), pluggable JobQueue / ResultStore ABCs for SQS / Kafka follow-ups; install with pip install 'sheaf-serve[worker]'
Job lifecycle: enqueue → processing → result / dead-letter; per-job webhook on completion (best-effort POST)
Priority lanes + per-tenant fair queuing

v0.7 — adapter multiplexing + client SDK

Adapter multiplexing:

LoRA / adapter hot-swap per request — one GPU deployment serves many fine-tunes; adapters dict on ModelSpec, adapter_id field in requests
Adapter registry: load on demand, LRU eviction when VRAM is tight
First targets: FLUX (style LoRAs), Whisper (language adapters), ESM-3 (task heads)

Client SDK:

pip install sheaf-client — typed Python client generated from request/response schemas
Async client (httpx-backed); retry + timeout; streams SSE natively
Language-agnostic: publish OpenAPI spec so teams can generate clients in any language

Architecture

┌─────────────────────────────────────────┐
│           API Layer                      │  typed contracts per model type
│  TimeSeriesRequest  TabularRequest  ...  │
├─────────────────────────────────────────┤
│         Scheduling Layer                 │  model-type-aware batching
│  BatchPolicy  RequestQueue               │
├─────────────────────────────────────────┤
│          Backend Layer                   │  pluggable execution + Ray Serve
│  ModelBackend  CacheManager  Feast       │
└─────────────────────────────────────────┘

Adding a new backend takes one class:

from sheaf.backends.base import ModelBackend
from sheaf.registry import register_backend

@register_backend("my-model")
class MyModelBackend(ModelBackend):
    def load(self) -> None:
        self._model = load_my_model()

    def predict(self, request):
        ...

    @property
    def model_type(self):
        return "time_series"

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for development setup.

License

Apache 2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.11.0

May 28, 2026

0.10.0

May 8, 2026

0.9.0

May 7, 2026

0.8.0

May 7, 2026

This version

0.7.0

Apr 30, 2026

0.6.0

Apr 20, 2026

0.5.1

Apr 19, 2026

0.5.0

Apr 17, 2026

0.4.0

Apr 16, 2026

0.3.0

Apr 16, 2026

0.1.0 yanked

Apr 15, 2026

Reason this release was yanked:

first release; missing the Python 3.11 floor that all subsequent versions require — caused silent broken installs on Python 3.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheaf_serve-0.7.0.tar.gz (804.4 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sheaf_serve-0.7.0-py3-none-any.whl (145.3 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file sheaf_serve-0.7.0.tar.gz.

File metadata

Download URL: sheaf_serve-0.7.0.tar.gz
Upload date: Apr 30, 2026
Size: 804.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_serve-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`509903e99d933e205300bd83eced364f8c5891ee8ae7e7454fe41c5aeef8b987`
MD5	`22456101245fd47043f2060b18c43b33`
BLAKE2b-256	`9c6be7e073de80c269a6921d168fb79b1458fce7880d5e753c56c629ceb3c7cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_serve-0.7.0.tar.gz:

Publisher: publish.yml on korbonits/sheaf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sheaf_serve-0.7.0.tar.gz
- Subject digest: 509903e99d933e205300bd83eced364f8c5891ee8ae7e7454fe41c5aeef8b987
- Sigstore transparency entry: 1409054548
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: korbonits/sheaf@42ecc52d7c7ae568db1edb9875b10da3e4873f17
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/korbonits
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@42ecc52d7c7ae568db1edb9875b10da3e4873f17
- Trigger Event: push

File details

Details for the file sheaf_serve-0.7.0-py3-none-any.whl.

File metadata

Download URL: sheaf_serve-0.7.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 145.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_serve-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b72a2ad05e33b1e3769fad6517d7f9cd95170a80030ef2ba02a915610e476ac6`
MD5	`ab837a2f94e7c8e5ebec7b1b033c2607`
BLAKE2b-256	`9c6b264cdbf90d2bd81f7236a2ce894bbc5d3de811e02da111292d0b4d0e55d9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_serve-0.7.0-py3-none-any.whl:

Publisher: publish.yml on korbonits/sheaf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sheaf_serve-0.7.0-py3-none-any.whl
- Subject digest: b72a2ad05e33b1e3769fad6517d7f9cd95170a80030ef2ba02a915610e476ac6
- Sigstore transparency entry: 1409054573
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: korbonits/sheaf@42ecc52d7c7ae568db1edb9875b10da3e4873f17
- Branch / Tag: refs/tags/v0.7.0
- Owner: https://github.com/korbonits
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@42ecc52d7c7ae568db1edb9875b10da3e4873f17
- Trigger Event: push

sheaf-serve 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Sheaf

Install

Quickstart

Supported model types

Roadmap to production

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance