This guide will help you migrate an existing codebase from the v1 Replicate Python SDK to v2.
🍪 Feed this doc to your coding agent to assist with the upgrade process!
If you encounter any issues, please share feedback on the GitHub Discussions page.
- v2 beta release notes: https://github.com/replicate/replicate-python-beta/releases/tag/v2.0.0-beta.1
- v2 beta SDK reference: https://sdks.replicate.com/python
- v2 beta GitHub discussion: #89
- HTTP API reference: https://replicate.com/docs/reference/http
Install the latest pre-release version of the v2 SDK from PyPI using pip:
pip install --pre replicateYou are not required to upgrade to the new 2.x version. If you're already using the 1.x version and want to continue using it, pin the version number in your dependency files.
Here's an example requirements.txt:
replicate>=1.0.0,<2.0.0
Here's an example pyproject.toml:
[project]
dependencies = [
"replicate>=1.0.0,<2.0.0",
]- Update client initialization to use
Replicate()instead ofClient()andbearer_tokeninstead ofapi_token- details - Replace prediction instance methods with client methods (e.g.,
replicate.predictions.wait(id)instead ofprediction.wait()) - details - Update async code to use
AsyncReplicateclient or context-aware module-level functions - details - Add keyword arguments to all API calls - details
- Update exception handling to use new exception types - details
In both the v1 and v2 SDKs, the simplest way to import and use the library is to import the replicate module and use the module-level functions like replicate.run(), without explicitly instantiating a client:
import replicate
output = replicate.run(...)☝️ This approach expects a REPLICATE_API_TOKEN variable to be present in the environment.
For cases where you need to instantiate a client (e.g., for custom configuration or async support), the client class name and parameter names have changed in v2:
import os
import replicate
from replicate import Client
client = Client(api_token=os.environ["REPLICATE_API_TOKEN"])import os
import replicate
from replicate import Replicate
client = Replicate(bearer_token=os.environ["REPLICATE_API_TOKEN"])The api_token parameter is still accepted for backward compatibility, but bearer_token is preferred.
Streaming works differently in v2. Prediction objects no longer have a stream() method and the replicate.stream() method is deprecated.
You should use replicate.use() with streaming=True for streaming output in the v2 SDK.
# Top-level streaming
for event in replicate.stream(
"anthropic/claude-4.5-sonnet",
input={"prompt": "Write a haiku"}
):
print(str(event), end="")
# Streaming from prediction object
prediction = replicate.predictions.create(..., stream=True)
for event in prediction.stream():
print(str(event), end="")# Use replicate.use() with streaming=True
model = replicate.use("anthropic/claude-4.5-sonnet", streaming=True)
for event in model(prompt="Write a haiku"):
print(str(event), end="")
# Streaming from prediction object is not available
prediction = replicate.predictions.create(...)
# prediction.stream() is not available in v2Note: replicate.stream() still works in v2 but is deprecated and will be removed in a future version.
Prediction objects in the v2 client no longer have instance methods like wait(), cancel(), and reload(). These have been removed in favor of client methods (e.g., use replicate.predictions.wait(prediction.id) instead of prediction.wait()).
# Create via model shorthand
prediction = replicate.predictions.create(
model="owner/model",
input={"prompt": "..."}
)# Create with keyword arguments model_owner and model_name
prediction = replicate.models.predictions.create(
model_owner="owner",
model_name="model",
input={"prompt": "..."}
)prediction = replicate.predictions.get("prediction_id")# Note: keyword argument required
prediction = replicate.predictions.get(prediction_id="prediction_id")prediction = replicate.predictions.create(...)
prediction.wait()prediction = replicate.predictions.create(...)
# prediction.wait() is not available
# Use resource method instead
prediction = replicate.predictions.wait(prediction.id)prediction = replicate.predictions.get("prediction_id")
prediction.cancel()prediction = replicate.predictions.get(prediction_id="prediction_id")
# prediction.cancel() is not available
# Use resource method instead
prediction = replicate.predictions.cancel(prediction.id)prediction = replicate.predictions.get("prediction_id")
prediction.reload()
print(prediction.status)prediction = replicate.predictions.get(prediction_id="prediction_id")
# prediction.reload() is not available
# Fetch fresh data instead
prediction = replicate.predictions.get(prediction_id=prediction.id)
print(prediction.status)Async functionality has been redesigned. Instead of separate async_* methods, v2 uses a dedicated AsyncReplicate client.
import replicate
# Async methods with async_ prefix
output = await replicate.async_run(...)
for event in replicate.async_stream(...):
print(event)
prediction = await replicate.predictions.async_create(...)
prediction = await replicate.predictions.async_get("id")
await prediction.async_wait()from replicate import AsyncReplicate
# Use AsyncReplicate client
client = AsyncReplicate()
# Same method names, no async_ prefix
output = await client.run(...)
async for event in client.stream(...):
print(event)
prediction = await client.predictions.create(...)
prediction = await client.predictions.get(prediction_id="id")
prediction = await client.predictions.wait(prediction.id)
# Or use module-level functions (context-aware)
output = await replicate.run(...)
async for event in replicate.stream(...):
print(event)Error handling is more granular in v2, with specific exception types for each HTTP status code.
from replicate.exceptions import ReplicateError, ModelError
try:
output = replicate.run(...)
except ModelError as e:
print(f"Model failed: {e.prediction.error}")
except ReplicateError as e:
print(f"API error: {e}")from replicate.exceptions import (
ModelError,
NotFoundError,
AuthenticationError,
RateLimitError,
APIStatusError
)
try:
output = replicate.run(...)
except ModelError as e:
print(f"Model failed: {e.prediction.error}")
except NotFoundError as e:
print(f"Not found: {e.message}")
except RateLimitError as e:
print(f"Rate limited: {e.message}")
except APIStatusError as e:
print(f"API error {e.status_code}: {e.message}")Available exception types in v2:
APIError- Base exception for all API errorsAPIConnectionError- Network connection errorsAPITimeoutError- Request timeout errorsAPIStatusError- Base for HTTP status errorsBadRequestError- 400 errorsAuthenticationError- 401 errorsPermissionDeniedError- 403 errorsNotFoundError- 404 errorsConflictError- 409 errorsUnprocessableEntityError- 422 errorsRateLimitError- 429 errorsInternalServerError- 500+ errorsModelError- Model execution failures
Pagination is more streamlined in v2 with auto-pagination support.
# Manual pagination
page = replicate.predictions.list()
for prediction in page.results:
print(prediction.id)
if page.next:
next_page = replicate.predictions.list(cursor=page.next)
# Auto-pagination
for page in replicate.paginate(replicate.predictions.list):
for prediction in page.results:
print(prediction.id)# Auto-pagination: iterate through all pages automatically
for prediction in replicate.predictions.list():
print(prediction.id)
# Automatically fetches more pages as needed
# Manual pagination (if needed)
page = replicate.predictions.list()
if page.has_next_page():
next_page = page.get_next_page()
# Access results from a single page
page = replicate.predictions.list()
for prediction in page.results:
print(prediction.id)Model and version access uses keyword arguments throughout, instead of shorthand positional arguments.
The new keyword argument syntax in v2 is more verbose but clearer and more consistent with Replicate's HTTP API, and consistent across all SDKs in different programming languages.
# Get model
model = replicate.models.get("owner/name")
# Get version from model
version = model.versions.get("version_id")
# List versions
versions = model.versions.list()# Get model (keyword arguments required)
model = replicate.models.get(
model_owner="owner",
model_name="name"
)
# Get version (no shorthand via model object)
version = replicate.models.versions.get(
model_owner="owner",
model_name="name",
version_id="version_id"
)
# List versions
versions = replicate.models.versions.list(
model_owner="owner",
model_name="name"
)The model.versions shorthand is not available in v2.
Training objects do not have the .wait() and .cancel() instance methods in v2.
training = replicate.trainings.create(
version="version_id",
input={"train_data": "https://..."},
destination="owner/model"
)
# Wait and cancel
training.wait()
training.cancel()training = replicate.trainings.create(
model_owner="owner",
model_name="model",
version_id="version_id",
input={"train_data": "https://..."},
destination="owner/new-model"
)
# No instance methods available
# Use client methods instead
# Wait for training (no trainings.wait() available)
# Poll with get() instead
while True:
training = replicate.trainings.get(training_id=training.id)
if training.status in ["succeeded", "failed", "canceled"]:
break
time.sleep(1)
# Cancel training
training = replicate.trainings.cancel(training_id=training.id)File upload handling has changed slightly.
# Upload file
file = replicate.files.create(
file=open("image.jpg", "rb"),
filename="image.jpg"
)
# Access URL
url = file.urls["get"]# Upload file (supports file handle, bytes, or PathLike)
with open("image.jpg", "rb") as f:
file = replicate.files.create(
content=f, # Can pass file handle directly
filename="image.jpg"
)
# Or read into memory if needed
with open("image.jpg", "rb") as f:
file = replicate.files.create(
content=f.read(),
filename="image.jpg"
)
# Access URL (property instead of dict)
url = file.urls.getFile inputs to predictions work the same way in both versions.
Collections API uses keyword arguments in v2.
collection = replicate.collections.get("collection_slug")
models = collection.modelscollection = replicate.collections.get(collection_slug="collection_slug")
# Models accessed via nested resource
models = replicate.collections.models.list(collection_slug="collection_slug")Webhook validation is compatible between v1 and v2.
from replicate.webhook import Webhooks
secret = replicate.webhooks.default.secret()
Webhooks.validate(request, secret)from replicate.resources.webhooks import Webhooks
secret = replicate.webhooks.default.secret()
Webhooks.validate(request, secret)The validation logic is identical; only the import paths differ.
The experimental use() interface is available in both versions with similar functionality.
flux = replicate.use("black-forest-labs/flux-schnell")
outputs = flux(prompt="astronaut on a horse")
# Async support
async_flux = replicate.use("black-forest-labs/flux-schnell")
outputs = await async_flux(prompt="astronaut on a horse")# Same interface
flux = replicate.use("black-forest-labs/flux-schnell")
outputs = flux(prompt="astronaut on a horse")
# Async support (uses use_async parameter)
async_flux = replicate.use("black-forest-labs/flux-schnell", use_async=True)
outputs = await async_flux(prompt="astronaut on a horse")# List all models (not available in v1)
for model in replicate.models.list():
print(model.name)V2 includes comprehensive type hints generated from the OpenAPI spec, providing better IDE autocomplete and type checking.
from replicate import Replicate
from replicate.types import Prediction
client: Replicate = Replicate()
prediction: Prediction = client.predictions.get(prediction_id="...")from replicate import Replicate, DefaultHttpxClient
import httpx
client = Replicate(
http_client=DefaultHttpxClient(
proxy="http://proxy.example.com",
transport=httpx.HTTPTransport(local_address="0.0.0.0")
),
timeout=30.0,
max_retries=3
)# Access raw HTTP responses
response = client.with_raw_response.predictions.get(prediction_id="...")
print(response.headers)
print(response.http_response.status_code)
prediction = response.parse()The response object is an APIResponse instance. See the README for full documentation.
# Stream response body
with client.with_streaming_response.predictions.get(
prediction_id="..."
) as response:
for chunk in response.iter_bytes():
process(chunk)The following features are not available in v2:
- Prediction instance methods:
wait(),cancel(),reload(),stream(),output_iterator() - Training instance methods:
cancel(),reload() - Model instance methods:
predict() model.versionsshorthand (usereplicate.models.versionsinstead)- Separate
async_*methods (useAsyncReplicateclient) - Positional arguments (all methods that map to HTTP API operations like
models.getandcollections.getnow require keyword arguments)
If you encounter issues during the migration process, share your feedback on the GitHub Discussions page.