Skip to main content

Python client for Replicate

Project description

Replicate Python client

This is a Python client for Replicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.

Breaking Changes in 1.0.0

The 1.0.0 release contains breaking changes:

  • The replicate.run() method now returns FileOutputs instead of URL strings by default for models that output files. FileOutput implements an iterable interface similar to httpx.Response, making it easier to work with files efficiently.

To revert to the previous behavior, you can opt out of FileOutput by passing use_file_output=False to replicate.run():

output = replicate.run("acmecorp/acme-model", use_file_output=False)

In most cases, updating existing applications to call output.url should resolve any issues. But we recommend using the FileOutput objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.

[!TIP] 👋 Check out an interactive version of this tutorial on Google Colab.

Open In Colab

Requirements

  • Python 3.8+

Install

pip install replicate

Authenticate

Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.

Grab your token from replicate.com/account and set it as an environment variable:

export REPLICATE_API_TOKEN=<your token>

We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.

Alternative authentication

As of replicate 1.0.7 and cog 0.14.11 it is possible to pass a REPLICATE_API_TOKEN via the context as part of a prediction request.

The Replicate() constructor will now use this context when available. This grants cog models the ability to use the Replicate client libraries, scoped to a user on a per request basis.

Run a model

Create a new Python file and add the following code, replacing the model identifier and input with your own:

>>> import replicate
>>> outputs = replicate.run(
        "black-forest-labs/flux-schnell",
        input={"prompt": "astronaut riding a rocket like a horse"}
    )
[<replicate.helpers.FileOutput object at 0x107179b50>]
>>> for index, output in enumerate(outputs):
        with open(f"output_{index}.webp", "wb") as file:
            file.write(output.read())

replicate.run raises ModelError if the prediction fails. You can access the exception's prediction property to get more information about the failure.

import replicate
from replicate.exceptions import ModelError

try:
  output = replicate.run("stability-ai/stable-diffusion-3", { "prompt": "An astronaut riding a rainbow unicorn" })
except ModelError as e
  if "(some known issue)" in e.prediction.logs:
    pass

  print("Failed prediction: " + e.prediction.id)

[!NOTE] By default the Replicate client will hold the connection open for up to 60 seconds while waiting for the prediction to complete. This is designed to optimize getting the model output back to the client as quickly as possible.

The timeout can be configured by passing wait=x to replicate.run() where x is a timeout in seconds between 1 and 60. To disable the sync mode you can pass wait=False.

AsyncIO support

You can also use the Replicate client asynchronously by prepending async_ to the method name.

Here's an example of how to run several predictions concurrently and wait for them all to complete:

import asyncio
import replicate
 
# https://replicate.com/stability-ai/sdxl
model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
prompts = [
    f"A chariot pulled by a team of {count} rainbow unicorns"
    for count in ["two", "four", "six", "eight"]
]

async with asyncio.TaskGroup() as tg:
    tasks = [
        tg.create_task(replicate.async_run(model_version, input={"prompt": prompt}))
        for prompt in prompts
    ]

results = await asyncio.gather(*tasks)
print(results)

To run a model that takes a file input you can pass either a URL to a publicly accessible file on the Internet or a handle to a file on your local device.

>>> output = replicate.run(
        "andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",
        input={ "image": open("path/to/mystery.jpg") }
    )

"an astronaut riding a horse"

Run a model and stream its output

Replicate’s API supports server-sent event streams (SSEs) for language models. Use the stream method to consume tokens as they're produced by the model.

import replicate

for event in replicate.stream(
    "meta/meta-llama-3-70b-instruct",
    input={
        "prompt": "Please write a haiku about llamas.",
    },
):
    print(str(event), end="")

[!TIP] Some models, like meta/meta-llama-3-70b-instruct, don't require a version string. You can always refer to the API documentation on the model page for specifics.

You can also stream the output of a prediction you create. This is helpful when you want the ID of the prediction separate from its output.

prediction = replicate.predictions.create(
    model="meta/meta-llama-3-70b-instruct",
    input={"prompt": "Please write a haiku about llamas."},
    stream=True,
)

for event in prediction.stream():
    print(str(event), end="")

For more information, see "Streaming output" in Replicate's docs.

Run a model in the background

You can start a model and run it in the background using async mode:

>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
    version=version,
    input={"prompt":"Watercolor painting of an underwater submarine"})

>>> prediction
Prediction(...)

>>> prediction.status
'starting'

>>> dict(prediction)
{"id": "...", "status": "starting", ...}

>>> prediction.reload()
>>> prediction.status
'processing'

>>> print(prediction.logs)
iteration: 0, render:loss: -0.6171875
iteration: 10, render:loss: -0.92236328125
iteration: 20, render:loss: -1.197265625
iteration: 30, render:loss: -1.3994140625

>>> prediction.wait()

>>> prediction.status
'succeeded'

>>> prediction.output
<replicate.helpers.FileOutput object at 0x107179b50>

>>> with open("output.png", "wb") as file:
        file.write(prediction.output.read())

Run a model in the background and get a webhook

You can run a model and get a webhook when it completes, instead of waiting for it to finish:

model = replicate.models.get("ai-forever/kandinsky-2.2")
version = model.versions.get("ea1addaab376f4dc227f5368bbd8eff901820fd1cc14ed8cad63b29249e9d463")
prediction = replicate.predictions.create(
    version=version,
    input={"prompt":"Watercolor painting of an underwater submarine"},
    webhook="https://example.com/your-webhook",
    webhook_events_filter=["completed"]
)

For details on receiving webhooks, see replicate.com/docs/webhooks.

Compose models into a pipeline

You can run a model and feed the output into another model:

laionide = replicate.models.get("afiaka87/laionide-v4").versions.get("b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05")
swinir = replicate.models.get("jingyunliang/swinir").versions.get("660d922d33153019e8c263a3bba265de882e7f4f70396546b6c9c8f9d47a021a")
image = laionide.predict(prompt="avocado armchair")
upscaled_image = swinir.predict(image=image)

Get output from a running model

Run a model and get its output while it's running:

iterator = replicate.run(
    "pixray/text2image:5c347a4bfa1d4523a58ae614c2194e15f2ae682b57e3797a5bb468920aa70ebf",
    input={"prompts": "san francisco sunset"}
)

for index, image in enumerate(iterator):
    with open(f"file_{index}.png", "wb") as file:
        file.write(image.read())

Cancel a prediction

You can cancel a running prediction:

>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
        version=version,
        input={"prompt":"Watercolor painting of an underwater submarine"}
    )

>>> prediction.status
'starting'

>>> prediction.cancel()

>>> prediction.reload()
>>> prediction.status
'canceled'

List predictions

You can list all the predictions you've run:

replicate.predictions.list()
# [<Prediction: 8b0ba5ab4d85>, <Prediction: 494900564e8c>]

Lists of predictions are paginated. You can get the next page of predictions by passing the next property as an argument to the list method:

page1 = replicate.predictions.list()

if page1.next:
    page2 = replicate.predictions.list(page1.next)

Load output files

Output files are returned as FileOutput objects:

import replicate
from PIL import Image # pip install pillow

output = replicate.run(
    "stability-ai/stable-diffusion:27b93a2413e7f36cd83da926f3656280b2931564ff050bf9575f1fdf9bcd7478",
    input={"prompt": "wavy colorful abstract patterns, oceans"}
    )

# This has a .read() method that returns the binary data.
with open("my_output.png", "wb") as file:
  file.write(output[0].read())
  
# It also implements the iterator protocol to stream the data.
background = Image.open(output[0])

FileOutput

Is a file-like object returned from the replicate.run() method that makes it easier to work with models that output files. It implements Iterator and AsyncIterator for reading the file data in chunks as well as read() and aread() to read the entire file into memory.

[!NOTE] It is worth noting that at this time read() and aread() do not currently accept a size argument to read up to size bytes.

Lastly, the URL of the underlying data source is available on the url attribute though we recommend you use the object as an iterator or use its read() or aread() methods, as the url property may not always return HTTP URLs in future.

print(output.url) #=> "data:image/png;base64,xyz123..." or "https://delivery.replicate.com/..."

To consume the file directly:

with open('output.bin', 'wb') as file:
    file.write(output.read())

Or for very large files they can be streamed:

with open(file_path, 'wb') as file:
    for chunk in output:
        file.write(chunk)

Each of these methods has an equivalent asyncio API.

async with aiofiles.open(filename, 'w') as file:
    await file.write(await output.aread())

async with aiofiles.open(filename, 'w') as file:
    await for chunk in output:
        await file.write(chunk)

For streaming responses from common frameworks, all support taking Iterator types:

Django

@condition(etag_func=None)
def stream_response(request):
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return HttpResponse(output, content_type='image/webp')

FastAPI

@app.get("/")
async def main():
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return StreamingResponse(output)

Flask

@app.route('/stream')
def streamed_response():
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return app.response_class(stream_with_context(output))

You can opt out of FileOutput by passing use_file_output=False to the replicate.run() method.

const replicate = replicate.run("acmecorp/acme-model", use_file_output=False);

List models

You can list the models you've created:

replicate.models.list()

Lists of models are paginated. You can get the next page of models by passing the next property as an argument to the list method, or you can use the paginate method to fetch pages automatically.

# Automatic pagination using `replicate.paginate` (recommended)
models = []
for page in replicate.paginate(replicate.models.list):
    models.extend(page.results)
    if len(models) > 100:
        break

# Manual pagination using `next` cursors
page = replicate.models.list()
while page:
    models.extend(page.results)
    if len(models) > 100:
          break
    page = replicate.models.list(page.next) if page.next else None

You can also find collections of featured models on Replicate:

>>> collections = [collection for page in replicate.paginate(replicate.collections.list) for collection in page]
>>> collections[0].slug
"vision-models"
>>> collections[0].description
"Multimodal large language models with vision capabilities like object detection and optical character recognition (OCR)"

>>> replicate.collections.get("text-to-image").models
[<Model: stability-ai/sdxl>, ...]

Create a model

You can create a model for a user or organization with a given name, visibility, and hardware SKU:

import replicate

model = replicate.models.create(
    owner="your-username",
    name="my-model",
    visibility="public",
    hardware="gpu-a40-large"
)

Here's how to list of all the available hardware for running models on Replicate:

>>> [hw.sku for hw in replicate.hardware.list()]
['cpu', 'gpu-t4', 'gpu-a40-small', 'gpu-a40-large']

Fine-tune a model

Use the training API to fine-tune models to make them better at a particular task. To see what language models currently support fine-tuning, check out Replicate's collection of trainable language models.

If you're looking to fine-tune image models, check out Replicate's guide to fine-tuning image models.

Here's how to fine-tune a model on Replicate:

training = replicate.trainings.create(
    model="stability-ai/sdxl",
    version="39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
    input={
      "input_images": "https://my-domain/training-images.zip",
      "token_string": "TOK",
      "caption_prefix": "a photo of TOK",
      "max_train_steps": 1000,
      "use_face_detection_instead": False
    },
    # You need to create a model on Replicate that will be the destination for the trained version.
    destination="your-username/model-name"
)

Customize client behavior

The replicate package exports a default shared client. This client is initialized with an API token set by the REPLICATE_API_TOKEN environment variable.

You can create your own client instance to pass a different API token value, add custom headers to requests, or control the behavior of the underlying HTTPX client:

import os
from replicate.client import Client

replicate = Client(
    api_token=os.environ["SOME_OTHER_REPLICATE_API_TOKEN"]
    headers={
        "User-Agent": "my-app/1.0"
    }
)

[!WARNING] Never hardcode authentication credentials like API tokens into your code. Instead, pass them as environment variables when running your program.

Experimental use() interface

The latest versions of replicate >= 1.1.0b1 include a new experimental use() function that is intended to make running a model closer to calling a function rather than an API request.

Some key differences to replicate.run().

  1. You "import" the model using the use() syntax, after that you call the model like a function.
  2. The output type matches the model definition.
  3. Baked in support for streaming for all models.
  4. File outputs will be represented as PathLike objects and downloaded to disk when used*.

[!NOTE] * We've replaced the FileOutput implementation with Path objects. However to avoid unnecessary downloading of files until they are needed we've implemented a PathProxy class that will defer the download until the first time the object is used. If you need the underlying URL of the Path object you can use the get_path_url(path: Path) -> str helper.

Examples

To use a model:

[!IMPORTANT] For now use() MUST be called in the top level module scope. We may relax this in future.

import replicate

flux_dev = replicate.use("black-forest-labs/flux-dev")
outputs = flux_dev(prompt="a cat wearing an amusing hat")

for output in outputs:
    print(output) # Path(/tmp/output.webp)

Models that implement iterators will return the output of the completed run as a list unless explicitly streaming (see Streaming section below). Language models that define x-cog-iterator-display: concatenate will return strings:

claude = replicate.use("anthropic/claude-4-sonnet")

output = claude(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")

print(output) # "Here's a recipe to feed all of California (about 39 million people)! ..."

You can pass the results of one model directly into another:

import replicate

flux_dev = replicate.use("black-forest-labs/flux-dev")
claude = replicate.use("anthropic/claude-4-sonnet")

images = flux_dev(prompt="a cat wearing an amusing hat")

result = claude(prompt="describe this image for me", image=images[0])

print(str(result)) # "This shows an image of a cat wearing a hat ..."

To create an individual prediction that has not yet resolved, use the create() method:

claude = replicate.use("anthropic/claude-4-sonnet")

prediction = claude.create(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")

prediction.logs() # get current logs (WIP)

prediction.output() # get the output

Streaming

Many models, particularly large language models (LLMs), will yield partial results as the model is running. To consume outputs from these models as they run you can pass the streaming argument to use():

claude = replicate.use("anthropic/claude-4-sonnet", streaming=True)

output = claude(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")

for chunk in output:
    print(chunk) # "Here's a recipe ", "to feed all", " of California"

Downloading file outputs

Output files are provided as Python os.PathLike objects. These are supported by most of the Python standard library like open() and Path, as well as third-party libraries like pillow and ffmpeg-python.

The first time the file is accessed it will be downloaded to a temporary directory on disk ready for use.

Here's an example of how to use the pillow package to convert file outputs:

import replicate
from PIL import Image

flux_dev = replicate.use("black-forest-labs/flux-dev")

images = flux_dev(prompt="a cat wearing an amusing hat")
for i, path in enumerate(images):
    with Image.open(path) as img:
        img.save(f"./output_{i}.png", format="PNG")

For libraries that do not support Path or PathLike instances you can use open() as you would with any other file. For example to use requests to upload the file to a different location:

import replicate
import requests

flux_dev = replicate.use("black-forest-labs/flux-dev")

images = flux_dev(prompt="a cat wearing an amusing hat")
for path in images:
    with open(path, "rb") as f:
        r = requests.post("https://api.example.com/upload", files={"file": f})

Accessing outputs as HTTPS URLs

If you do not need to download the output to disk. You can access the underlying URL for a Path object returned from a model call by using the get_path_url() helper.

import replicate
from replicate import get_url_path

flux_dev = replicate.use("black-forest-labs/flux-dev")
outputs = flux_dev(prompt="a cat wearing an amusing hat")

for output in outputs:
    print(get_url_path(output)) # "https://replicate.delivery/xyz"

Async Mode

By default use() will return a function instance with a sync interface. You can pass use_async=True to have it return an AsyncFunction that provides an async interface.

import asyncio
import replicate

async def main():
    flux_dev = replicate.use("black-forest-labs/flux-dev", use_async=True)
    outputs = await flux_dev(prompt="a cat wearing an amusing hat")

    for output in outputs:
        print(Path(output))

asyncio.run(main())

When used in streaming mode then an AsyncIterator will be returned.

import asyncio
import replicate

async def main():
    claude = replicate.use("anthropic/claude-3.5-haiku", streaming=True, use_async=True)
    output = await claude(prompt="say hello")

    # Stream the response as it comes in.
    async for token in output:
        print(token)

    # Wait until model has completed. This will return either a `list` or a `str` depending
    # on whether the model uses AsyncIterator or ConcatenateAsyncIterator. You can check this
    # on the model schema by looking for `x-cog-display: concatenate`.
    print(await output)

asyncio.run(main())

Typing

By default use() knows nothing about the interface of the model. To provide a better developer experience we provide two methods to add type annotations to the function returned by the use() helper.

1. Provide a function signature

The use method accepts a function signature as an additional hint keyword argument. When provided it will use this signature for the model() and model.create() functions.

# Flux takes a required prompt string and optional image and seed.
def hint(*, prompt: str, image: Path | None = None, seed: int | None = None) -> str: ...

flux_dev = use("black-forest-labs/flux-dev", hint=hint)
output1 = flux_dev() # will warn that `prompt` is missing
output2 = flux_dev(prompt="str") # output2 will be typed as `str`

2. Provide a class

The second method requires creating a callable class with a name field. The name will be used as the function reference when passed to use().

class FluxDev:
    name = "black-forest-labs/flux-dev"

    def __call__( self, *, prompt: str, image: Path | None = None, seed: int | None = None ) -> str: ...

flux_dev = use(FluxDev)
output1 = flux_dev() # will warn that `prompt` is missing
output2 = flux_dev(prompt="str") # output2 will be typed as `str`

[!WARNING] Currently the typing system doesn't correctly support the streaming flag for models that return lists or use iterators. We're working on improvements here.

In future we hope to provide tooling to generate and provide these models as packages to make working with them easier. For now you may wish to create your own.

API Reference

The Replicate Python Library provides several key classes and functions for working with models in pipelines:

use() Function

Creates a callable function wrapper for a Replicate model.

def use(
    ref: FunctionRef,
    *,
    streaming: bool = False,
    use_async: bool = False
) -> Function | AsyncFunction

def use(
    ref: str,
    *,
    hint: Callable | None = None,
    streaming: bool = False,
    use_async: bool = False
) -> Function | AsyncFunction

Parameters:

Parameter Type Default Description
ref str | FunctionRef Required Model reference (e.g., "owner/model" or "owner/model:version")
hint Callable | None None Function signature for type hints
streaming bool False Return OutputIterator for streaming results
use_async bool False Return AsyncFunction instead of Function

Returns:

  • Function - Synchronous model wrapper (default)
  • AsyncFunction - Asynchronous model wrapper (when use_async=True)

Function Class

A synchronous wrapper for calling Replicate models.

Methods:

Method Signature Description
__call__() (*args, **inputs) -> Output Execute the model and return final output
create() (*args, **inputs) -> Run Start a prediction and return Run object

Properties:

Property Type Description
openapi_schema dict Model's OpenAPI schema for inputs/outputs
default_example dict | None Default example inputs (not yet implemented)

AsyncFunction Class

An asynchronous wrapper for calling Replicate models.

Methods:

Method Signature Description
__call__() async (*args, **inputs) -> Output Execute the model and return final output
create() async (*args, **inputs) -> AsyncRun Start a prediction and return AsyncRun object

Properties:

Property Type Description
openapi_schema() async () -> dict Model's OpenAPI schema for inputs/outputs
default_example dict | None Default example inputs (not yet implemented)

Run Class

Represents a running prediction with access to output and logs.

Methods:

Method Signature Description
output() () -> Output Get prediction output (blocks until complete)
logs() () -> str | None Get current prediction logs

Behavior:

  • When streaming=True: Returns OutputIterator immediately
  • When streaming=False: Waits for completion and returns final result

AsyncRun Class

Asynchronous version of Run for async model calls.

Methods:

Method Signature Description
output() async () -> Output Get prediction output (awaits completion)
logs() async () -> str | None Get current prediction logs

OutputIterator Class

Iterator wrapper for streaming model outputs.

Methods:

Method Signature Description
__iter__() () -> Iterator[T] Synchronous iteration over output chunks
__aiter__() () -> AsyncIterator[T] Asynchronous iteration over output chunks
__str__() () -> str Convert to string (concatenated or list representation)
__await__() () -> List[T] | str Await all results (string for concatenate, list otherwise)

URLPath Class

A path-like object that downloads files on first access.

Methods:

Method Signature Description
__fspath__() () -> str Get local file path (downloads if needed)
__str__() () -> str String representation of local path

Usage:

  • Compatible with open(), pathlib.Path(), and most file operations
  • Downloads file automatically on first filesystem access
  • Cached locally in temporary directory

get_path_url() Function

Helper function to extract original URLs from URLPath objects.

def get_path_url(path: Any) -> str | None

Parameters:

Parameter Type Description
path Any Path object (typically URLPath)

Returns:

  • str - Original URL if path is a URLPath
  • None - If path is not a URLPath or has no URL

TODO

There are several key things still outstanding:

  1. Support for streaming text when available (rather than polling)
  2. Support for streaming files when available (rather than polling)
  3. Support for cleaning up downloaded files.
  4. Support for streaming logs using OutputIterator.

Development

See CONTRIBUTING.md

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replicate-1.1.0b3.tar.gz (80.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

replicate-1.1.0b3-py3-none-any.whl (57.9 kB view details)

Uploaded Python 3

File details

Details for the file replicate-1.1.0b3.tar.gz.

File metadata

  • Download URL: replicate-1.1.0b3.tar.gz
  • Upload date:
  • Size: 80.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replicate-1.1.0b3.tar.gz
Algorithm Hash digest
SHA256 f7c0888741bf1e26b6d5a11c846aa980cfe05bad703a1452edf88b1a6677807e
MD5 19323eb88c3512f2e8b38bf6876c275f
BLAKE2b-256 90d13b0bd16f11dc27a10d46ea8fd691840e8a94ffb55c97adceb115fc0a62a5

See more details on using hashes here.

File details

Details for the file replicate-1.1.0b3-py3-none-any.whl.

File metadata

  • Download URL: replicate-1.1.0b3-py3-none-any.whl
  • Upload date:
  • Size: 57.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replicate-1.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 9cc1088637450300df39cb60710d1d4f341e199849453dd5b810ab498c819306
MD5 1f2c1e4e942859580f57a38f785f9ca4
BLAKE2b-256 4829c30f7813e76f7d387a0c907cbf6111a0e569b1a3b624bd8d557ccfb421bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page