Python client for Replicate

Project description

Replicate Python client

This is a Python client for Replicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.

Breaking Changes in 1.0.0

The 1.0.0 release contains breaking changes:

The replicate.run() method now returns FileOutputs instead of URL strings by default for models that output files. FileOutput implements an iterable interface similar to httpx.Response, making it easier to work with files efficiently.

To revert to the previous behavior, you can opt out of FileOutput by passing use_file_output=False to replicate.run():

output = replicate.run("acmecorp/acme-model", use_file_output=False)

In most cases, updating existing applications to call output.url should resolve any issues. But we recommend using the FileOutput objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.

[!TIP] 👋 Check out an interactive version of this tutorial on Google Colab.

Requirements

Python 3.8+

Install

pip install replicate

Authenticate

Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.

Grab your token from replicate.com/account and set it as an environment variable:

export REPLICATE_API_TOKEN=<your token>

We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.

Alternative authentication

As of replicate 1.0.7 and cog 0.14.11 it is possible to pass a REPLICATE_API_TOKEN via the context as part of a prediction request.

The Replicate() constructor will now use this context when available. This grants cog models the ability to use the Replicate client libraries, scoped to a user on a per request basis.

Run a model

Create a new Python file and add the following code, replacing the model identifier and input with your own:

>>> import replicate
>>> outputs = replicate.run(
        "black-forest-labs/flux-schnell",
        input={"prompt": "astronaut riding a rocket like a horse"}
    )
[<replicate.helpers.FileOutput object at 0x107179b50>]
>>> for index, output in enumerate(outputs):
        with open(f"output_{index}.webp", "wb") as file:
            file.write(output.read())

replicate.run raises ModelError if the prediction fails. You can access the exception's prediction property to get more information about the failure.

import replicate
from replicate.exceptions import ModelError

try:
  output = replicate.run("stability-ai/stable-diffusion-3", { "prompt": "An astronaut riding a rainbow unicorn" })
except ModelError as e
  if "(some known issue)" in e.prediction.logs:
    pass

  print("Failed prediction: " + e.prediction.id)

[!NOTE] By default the Replicate client will hold the connection open for up to 60 seconds while waiting for the prediction to complete. This is designed to optimize getting the model output back to the client as quickly as possible.

The timeout can be configured by passing wait=x to replicate.run() where x is a timeout in seconds between 1 and 60. To disable the sync mode you can pass wait=False.

AsyncIO support

You can also use the Replicate client asynchronously by prepending async_ to the method name.

Here's an example of how to run several predictions concurrently and wait for them all to complete:

import asyncio
import replicate
 
# https://replicate.com/stability-ai/sdxl
model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
prompts = [
    f"A chariot pulled by a team of {count} rainbow unicorns"
    for count in ["two", "four", "six", "eight"]
]

async with asyncio.TaskGroup() as tg:
    tasks = [
        tg.create_task(replicate.async_run(model_version, input={"prompt": prompt}))
        for prompt in prompts
    ]

results = await asyncio.gather(*tasks)
print(results)

To run a model that takes a file input you can pass either a URL to a publicly accessible file on the Internet or a handle to a file on your local device.

>>> output = replicate.run(
        "andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",
        input={ "image": open("path/to/mystery.jpg") }
    )

"an astronaut riding a horse"

Run a model and stream its output

Replicate’s API supports server-sent event streams (SSEs) for language models. Use the stream method to consume tokens as they're produced by the model.

import replicate

for event in replicate.stream(
    "meta/meta-llama-3-70b-instruct",
    input={
        "prompt": "Please write a haiku about llamas.",
    },
):
    print(str(event), end="")

[!TIP] Some models, like meta/meta-llama-3-70b-instruct, don't require a version string. You can always refer to the API documentation on the model page for specifics.

You can also stream the output of a prediction you create. This is helpful when you want the ID of the prediction separate from its output.

prediction = replicate.predictions.create(
    model="meta/meta-llama-3-70b-instruct",
    input={"prompt": "Please write a haiku about llamas."},
    stream=True,
)

for event in prediction.stream():
    print(str(event), end="")

For more information, see "Streaming output" in Replicate's docs.

Run a model in the background

You can start a model and run it in the background using async mode:

>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
    version=version,
    input={"prompt":"Watercolor painting of an underwater submarine"})

>>> prediction
Prediction(...)

>>> prediction.status
'starting'

>>> dict(prediction)
{"id": "...", "status": "starting", ...}

>>> prediction.reload()
>>> prediction.status
'processing'

>>> print(prediction.logs)
iteration: 0, render:loss: -0.6171875
iteration: 10, render:loss: -0.92236328125
iteration: 20, render:loss: -1.197265625
iteration: 30, render:loss: -1.3994140625

>>> prediction.wait()

>>> prediction.status
'succeeded'

>>> prediction.output
<replicate.helpers.FileOutput object at 0x107179b50>

>>> with open("output.png", "wb") as file:
        file.write(prediction.output.read())

Run a model in the background and get a webhook

You can run a model and get a webhook when it completes, instead of waiting for it to finish:

model = replicate.models.get("ai-forever/kandinsky-2.2")
version = model.versions.get("ea1addaab376f4dc227f5368bbd8eff901820fd1cc14ed8cad63b29249e9d463")
prediction = replicate.predictions.create(
    version=version,
    input={"prompt":"Watercolor painting of an underwater submarine"},
    webhook="https://example.com/your-webhook",
    webhook_events_filter=["completed"]
)

For details on receiving webhooks, see replicate.com/docs/webhooks.

Compose models into a pipeline

You can run a model and feed the output into another model:

laionide = replicate.models.get("afiaka87/laionide-v4").versions.get("b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05")
swinir = replicate.models.get("jingyunliang/swinir").versions.get("660d922d33153019e8c263a3bba265de882e7f4f70396546b6c9c8f9d47a021a")
image = laionide.predict(prompt="avocado armchair")
upscaled_image = swinir.predict(image=image)

Get output from a running model

Run a model and get its output while it's running:

iterator = replicate.run(
    "pixray/text2image:5c347a4bfa1d4523a58ae614c2194e15f2ae682b57e3797a5bb468920aa70ebf",
    input={"prompts": "san francisco sunset"}
)

for index, image in enumerate(iterator):
    with open(f"file_{index}.png", "wb") as file:
        file.write(image.read())

Cancel a prediction

You can cancel a running prediction:

>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
        version=version,
        input={"prompt":"Watercolor painting of an underwater submarine"}
    )

>>> prediction.status
'starting'

>>> prediction.cancel()

>>> prediction.reload()
>>> prediction.status
'canceled'

List predictions

You can list all the predictions you've run:

replicate.predictions.list()
# [<Prediction: 8b0ba5ab4d85>, <Prediction: 494900564e8c>]

Lists of predictions are paginated. You can get the next page of predictions by passing the next property as an argument to the list method:

page1 = replicate.predictions.list()

if page1.next:
    page2 = replicate.predictions.list(page1.next)

Load output files

Output files are returned as FileOutput objects:

import replicate
from PIL import Image # pip install pillow

output = replicate.run(
    "stability-ai/stable-diffusion:27b93a2413e7f36cd83da926f3656280b2931564ff050bf9575f1fdf9bcd7478",
    input={"prompt": "wavy colorful abstract patterns, oceans"}
    )

# This has a .read() method that returns the binary data.
with open("my_output.png", "wb") as file:
  file.write(output[0].read())
  
# It also implements the iterator protocol to stream the data.
background = Image.open(output[0])

FileOutput

Is a file-like object returned from the replicate.run() method that makes it easier to work with models that output files. It implements Iterator and AsyncIterator for reading the file data in chunks as well as read() and aread() to read the entire file into memory.

[!NOTE] It is worth noting that at this time read() and aread() do not currently accept a size argument to read up to size bytes.

Lastly, the URL of the underlying data source is available on the url attribute though we recommend you use the object as an iterator or use its read() or aread() methods, as the url property may not always return HTTP URLs in future.

print(output.url) #=> "data:image/png;base64,xyz123..." or "https://delivery.replicate.com/..."

To consume the file directly:

with open('output.bin', 'wb') as file:
    file.write(output.read())

Or for very large files they can be streamed:

with open(file_path, 'wb') as file:
    for chunk in output:
        file.write(chunk)

Each of these methods has an equivalent asyncio API.

async with aiofiles.open(filename, 'w') as file:
    await file.write(await output.aread())

async with aiofiles.open(filename, 'w') as file:
    await for chunk in output:
        await file.write(chunk)

For streaming responses from common frameworks, all support taking Iterator types:

Django

@condition(etag_func=None)
def stream_response(request):
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return HttpResponse(output, content_type='image/webp')

FastAPI

@app.get("/")
async def main():
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return StreamingResponse(output)

Flask

@app.route('/stream')
def streamed_response():
    output = replicate.run("black-forest-labs/flux-schnell", input={...}, use_file_output =True)
    return app.response_class(stream_with_context(output))

You can opt out of FileOutput by passing use_file_output=False to the replicate.run() method.

const replicate = replicate.run("acmecorp/acme-model", use_file_output=False);

List models

You can list the models you've created:

replicate.models.list()

Lists of models are paginated. You can get the next page of models by passing the next property as an argument to the list method, or you can use the paginate method to fetch pages automatically.

# Automatic pagination using `replicate.paginate` (recommended)
models = []
for page in replicate.paginate(replicate.models.list):
    models.extend(page.results)
    if len(models) > 100:
        break

# Manual pagination using `next` cursors
page = replicate.models.list()
while page:
    models.extend(page.results)
    if len(models) > 100:
          break
    page = replicate.models.list(page.next) if page.next else None

You can also find collections of featured models on Replicate:

>>> collections = [collection for page in replicate.paginate(replicate.collections.list) for collection in page]
>>> collections[0].slug
"vision-models"
>>> collections[0].description
"Multimodal large language models with vision capabilities like object detection and optical character recognition (OCR)"

>>> replicate.collections.get("text-to-image").models
[<Model: stability-ai/sdxl>, ...]

Create a model

You can create a model for a user or organization with a given name, visibility, and hardware SKU:

import replicate

model = replicate.models.create(
    owner="your-username",
    name="my-model",
    visibility="public",
    hardware="gpu-a40-large"
)

Here's how to list of all the available hardware for running models on Replicate:

>>> [hw.sku for hw in replicate.hardware.list()]
['cpu', 'gpu-t4', 'gpu-a40-small', 'gpu-a40-large']

Fine-tune a model

Use the training API to fine-tune models to make them better at a particular task. To see what language models currently support fine-tuning, check out Replicate's collection of trainable language models.

If you're looking to fine-tune image models, check out Replicate's guide to fine-tuning image models.

Here's how to fine-tune a model on Replicate:

training = replicate.trainings.create(
    model="stability-ai/sdxl",
    version="39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
    input={
      "input_images": "https://my-domain/training-images.zip",
      "token_string": "TOK",
      "caption_prefix": "a photo of TOK",
      "max_train_steps": 1000,
      "use_face_detection_instead": False
    },
    # You need to create a model on Replicate that will be the destination for the trained version.
    destination="your-username/model-name"
)

Customize client behavior

The replicate package exports a default shared client. This client is initialized with an API token set by the REPLICATE_API_TOKEN environment variable.

You can create your own client instance to pass a different API token value, add custom headers to requests, or control the behavior of the underlying HTTPX client:

import os
from replicate.client import Client

replicate = Client(
    api_token=os.environ["SOME_OTHER_REPLICATE_API_TOKEN"]
    headers={
        "User-Agent": "my-app/1.0"
    }
)

[!WARNING] Never hardcode authentication credentials like API tokens into your code. Instead, pass them as environment variables when running your program.

Development

See CONTRIBUTING.md

Project details

Release history Release notifications | RSS feed

2.0.0b4 pre-release

Dec 18, 2025

2.0.0b3 pre-release

Nov 16, 2025

2.0.0b2 pre-release

Oct 24, 2025

2.0.0b1 pre-release

Oct 23, 2025

2.0.0a31 pre-release

Oct 22, 2025

2.0.0a30 pre-release

Oct 15, 2025

2.0.0a29 pre-release

Oct 15, 2025

2.0.0a28 pre-release

Oct 7, 2025

2.0.0a27 pre-release

Sep 29, 2025

2.0.0a26 pre-release

Sep 17, 2025

2.0.0a25 pre-release

Sep 15, 2025

2.0.0a24 pre-release

Sep 12, 2025

2.0.0a23 pre-release

Sep 4, 2025

2.0.0a22 pre-release

Aug 28, 2025

2.0.0a21 pre-release

Aug 27, 2025

2.0.0a20 pre-release

Aug 25, 2025

2.0.0a19 pre-release

Aug 22, 2025

2.0.0a18 pre-release

Aug 20, 2025

2.0.0a17 pre-release

Aug 12, 2025

2.0.0a16 pre-release

Aug 11, 2025

2.0.0a15 pre-release

Aug 6, 2025

2.0.0a14 pre-release

Jul 31, 2025

2.0.0a13 pre-release

Jul 25, 2025

2.0.0a12 pre-release

Jul 23, 2025

2.0.0a11 pre-release

Jul 12, 2025

2.0.0a10 pre-release

Jul 8, 2025

2.0.0a9 pre-release

Jul 2, 2025

2.0.0a8 pre-release

Jun 30, 2025

2.0.0a7 pre-release

Jun 27, 2025

2.0.0a6 pre-release

Jun 26, 2025

2.0.0a5 pre-release

Jun 25, 2025

2.0.0a4 pre-release

Jun 18, 2025

2.0.0a3 pre-release

Jun 17, 2025

2.0.0a2 pre-release

Jun 16, 2025

2.0.0a1 pre-release

Jun 10, 2025

1.1.0b3 pre-release

Aug 26, 2025

1.1.0b2 pre-release

Jun 12, 2025

1.1.0b1 pre-release

Jun 9, 2025

This version

1.0.7

May 27, 2025

1.0.6

Apr 25, 2025

1.0.4

Nov 25, 2024

1.0.3

Oct 28, 2024

1.0.2

Oct 16, 2024

1.0.1

Oct 9, 2024

1.0.0

Oct 9, 2024

1.0.0b3 pre-release

Oct 5, 2024

1.0.0b2 pre-release

Oct 4, 2024

1.0.0b1 pre-release

Oct 4, 2024

0.34.2

Oct 4, 2024

0.34.1

Sep 25, 2024

0.34.0

Sep 25, 2024

0.33.0

Sep 16, 2024

0.32.1

Aug 30, 2024

0.32.0

Aug 22, 2024

0.31.0

Jul 31, 2024

0.30.1

Jul 25, 2024

0.30.0

Jul 25, 2024

0.29.0

Jul 18, 2024

0.28.0

Jul 5, 2024

0.27.0

Jun 28, 2024

0.26.1

Jun 21, 2024

0.26.0

May 14, 2024

0.25.2

Apr 19, 2024

0.25.1

Mar 21, 2024

0.25.0

Mar 19, 2024

0.24.0

Feb 19, 2024

0.23.1

Jan 27, 2024

0.23.0

Jan 23, 2024

0.22.0

Dec 8, 2023

0.21.1

Dec 4, 2023

0.21.0

Nov 27, 2023

0.20.0

Nov 17, 2023

0.19.0

Nov 16, 2023

0.18.1

Nov 9, 2023

0.18.0 yanked

Nov 9, 2023

Reason this release was yanked:

bug

0.17.0

Nov 7, 2023

0.16.0

Nov 6, 2023

0.15.8

Nov 5, 2023

0.15.7

Nov 2, 2023

0.15.6 yanked

Nov 2, 2023

Reason this release was yanked:

bug

0.15.5

Oct 29, 2023

0.15.4

Oct 9, 2023

0.15.3

Oct 5, 2023

0.15.2

Oct 5, 2023

0.15.1

Oct 4, 2023

0.15.0 yanked

Oct 4, 2023

Reason this release was yanked:

bug

0.14.0

Oct 3, 2023

0.13.0

Sep 17, 2023

0.12.0

Sep 11, 2023

0.11.0

Aug 7, 2023

0.10.0

Jul 31, 2023

0.9.0

Jul 19, 2023

0.8.4

Jul 4, 2023

0.8.3

May 23, 2023

0.8.2

May 19, 2023

0.8.1

Apr 12, 2023

0.8.0

Apr 12, 2023

0.7.0

Apr 6, 2023

0.6.1

Mar 28, 2023

0.6.0

Mar 26, 2023

0.5.2

Mar 15, 2023

0.5.1

Mar 3, 2023

0.5.0

Feb 20, 2023

0.4.0

Sep 14, 2022

0.0.1a16 pre-release

Sep 7, 2022

0.0.1a15 pre-release

Jun 14, 2022

0.0.1a14 pre-release

Jun 8, 2022

0.0.1a13 pre-release

Jun 7, 2022

0.0.1a12 pre-release

Jun 4, 2022

0.0.1a11 pre-release

Jun 3, 2022

0.0.1a10 pre-release

May 27, 2022

0.0.1a9 pre-release

May 25, 2022

0.0.1a8 pre-release

May 25, 2022

0.0.1a7 pre-release

May 19, 2022

0.0.1a6 pre-release

May 17, 2022

0.0.1a5 pre-release

May 17, 2022

0.0.1a4 pre-release

May 17, 2022

0.0.1a3 pre-release

May 17, 2022

0.0.1a2 pre-release

May 17, 2022

0.0.1a1 pre-release

May 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replicate-1.0.7.tar.gz (62.2 kB view details)

Uploaded May 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

replicate-1.0.7-py3-none-any.whl (48.6 kB view details)

Uploaded May 27, 2025 Python 3

File details

Details for the file replicate-1.0.7.tar.gz.

File metadata

Download URL: replicate-1.0.7.tar.gz
Upload date: May 27, 2025
Size: 62.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replicate-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`d88cb2c37ba39fb370c87fc3291601c67aae64bb918a20a85b5ce399c23ee84c`
MD5	`d5cb37881eb66be1f3f55eac78feb05b`
BLAKE2b-256	`4bfdcaf6c59a6b8007366bd52ab5a320bf8d828f3860a60039309cfc0e375ec9`

See more details on using hashes here.

File details

Details for the file replicate-1.0.7-py3-none-any.whl.

File metadata

Download URL: replicate-1.0.7-py3-none-any.whl
Upload date: May 27, 2025
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for replicate-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`667c50a9eb83be17de6278ff89483102b3b50f49a2c7fbcaa2e2b14df13816f9`
MD5	`69effb10e01b11c3e4d87a9054953e27`
BLAKE2b-256	`c25ab3aa02a11a33de08e7771579154af3193decfb9d923b30b14c17b4e8bbce`

See more details on using hashes here.

replicate 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Replicate Python client

Breaking Changes in 1.0.0

Requirements

Install

Authenticate

Run a model

AsyncIO support

Run a model and stream its output

Run a model in the background

Run a model in the background and get a webhook

Compose models into a pipeline

Get output from a running model

Cancel a prediction

List predictions

Load output files

FileOutput

List models

Create a model

Fine-tune a model

Customize client behavior

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes