Deploy GGUF models to RunPod or Replicate with one command.

These details have not been verified by PyPI

Project links

Project description

infera — deploy & chill

infera

Deploy GGUF (llama-cpp-python) models to RunPod or Replicate with one command.

pip install infera-deploy

infera init my-project
cd my-project
cp ~/Downloads/llama.gguf models/
infera deploy runpod        # or: replicate

That's it. No Dockerfile, no Cog config, no GraphQL — infera writes the runtime, builds the image, uploads the model, and registers the serverless endpoint.

Package name on PyPI is infera-deploy; the Python module and CLI are both infera.

What you'll need

Python 3.10+
A .gguf model file (e.g. from TheBloke on Hugging Face)
For RunPod: Docker daemon, RunPod API key, Docker Hub login (docker login)
For Replicate: cog (Linux/macOS or WSL), cog login

What `infera deploy` actually does

Bundles a runtime tailored to the provider (Dockerfile + handler for RunPod, predict.py + cog.yaml for Replicate)
Builds and pushes the container image
(RunPod) Creates a network volume and uploads .gguf files to it — idempotent, skips unchanged models via MD5
Registers / upserts the serverless endpoint
Smoke-tests it and prints the URL

Re-runs are idempotent: same template, same volume, only changed bits get re-shipped.

Calling a deployed endpoint

The job input is OpenAI-ish:

{
  "input": {
    "messages":    [{"role": "user", "content": "Hello"}],
    "model":       "llama",
    "temperature": 0.7,
    "max_tokens":  512
  }
}

model is optional — it's the filename stem (e.g. llama-3.2-1b for llama-3.2-1b.gguf). If omitted, the first model alphabetically is used.

For embeddings: "endpoint": "embeddings" and "input": "text" (or a list).

For function calling / structured output: pass tools, response_format, or grammar (GBNF) the same way you would to OpenAI.

RunPod: POST https://api.runpod.ai/v2/<endpoint>/runsync with Authorization: Bearer <RUNPOD_KEY>. Replicate: standard Replicate API. messages and tools are JSON-encoded strings (Cog limitation).

Adding a model to a deployed project

cp another.gguf models/
infera deploy runpod

Idempotent — only the new .gguf gets uploaded. Multiple models live side-by-side on the volume; pick one per request via the model field.

Provider configs

First infera deploy <provider> drops <provider>.yaml into the project root. Edit and re-deploy.

# runpod.yaml
gpu:           AMPERE_16,AMPERE_24
gpu_vram_min:  8
workers_min:   0
workers_max:   1
idle_timeout:  5
datacenter:    EU-RO-1

Using the engine locally (advanced)

from infera import Engine

engine = Engine("./models")
print(engine.chat([{"role": "user", "content": "Hello"}]))

Support

If infera saved you an afternoon of Dockerfile yak-shaving, consider buying me a coffee:

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infera_deploy-0.1.0.tar.gz (18.2 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

infera_deploy-0.1.0-py3-none-any.whl (20.9 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file infera_deploy-0.1.0.tar.gz.

File metadata

Download URL: infera_deploy-0.1.0.tar.gz
Upload date: May 5, 2026
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for infera_deploy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ff3bd8574c4dcd06c9f7282e2baf2e6809c43a30ce8d41f6a8f05d7f1f193190`
MD5	`5e10fb8a27e5de26ba313fe0f64819ae`
BLAKE2b-256	`64506126de97d08efd017f4214ddab89edc97ff654a5eaebe512d3ee3c4a5f9f`

See more details on using hashes here.

File details

Details for the file infera_deploy-0.1.0-py3-none-any.whl.

File metadata

Download URL: infera_deploy-0.1.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 20.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for infera_deploy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96a2dd827c86566271781eb522f5673ddec67f800956c60a9d9108630af7867f`
MD5	`73c214765f8cc8b2493e133f509515a6`
BLAKE2b-256	`7a14621c4758f46633ec2f421fff40cb37be0df5e7ce69d8ad3ca1d585d10884`

See more details on using hashes here.

infera-deploy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

infera

What you'll need

What `infera deploy` actually does

Calling a deployed endpoint

Adding a model to a deployed project

Provider configs

Using the engine locally (advanced)

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

infera-deploy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

infera

What you'll need

What infera deploy actually does

Calling a deployed endpoint

Adding a model to a deployed project

Provider configs

Using the engine locally (advanced)

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What `infera deploy` actually does