Skip to content

Configure concurrency in predictor rather than in cog.yaml #2811

@bfirsh

Description

@bfirsh

It always felt a bit funny to me that concurrency was declared in cog.yaml, when it felt like a thing that was associated with the predictor. It's the async on predict() that marks whether a model can be concurrent, then max concurrency is sort like a default concurrency for that function.

This becomes more clear when we consider train, or other arbitrary functions if that's ever a thing. It's each function that has a concurrency, not the model as a whole.

Anyway, here's what Claude thinks it could look be if it was a decorator...

Proposed API

import cog
from cog import BasePredictor, Input

# Class-based predictor
class Predictor(BasePredictor):
    def setup(self):
        self.model = load_model()

    @cog.concurrent(max=5)
    async def predict(self, prompt: str = Input()) -> str:
        return await self.model.generate(prompt)

# Function-based predictor
@cog.concurrent(max=5)
async def predict(prompt: str = Input()) -> str:
    return await model.generate(prompt)

Design

Decorator behavior

  • @cog.concurrent(max=N) attaches __cog_concurrent_max__ = N metadata to the decorated function.
  • Applying @cog.concurrent(max=N) where N > 1 to a synchronous function raises TypeError at decoration time.
  • @cog.concurrent (bare, without arguments) is also supported and defaults to max=1.

Precedence (highest to lowest)

  1. COG_MAX_CONCURRENCY environment variable at runtime (operator override)
  2. cog.yaml concurrency.max → baked into Dockerfile as COG_MAX_CONCURRENCY (deprecated, see below)
  3. @cog.concurrent(max=N) decorator on the predict function
  4. Default: 1

Since cog.yaml concurrency.max works by setting the env var at build time, cases 1 and 2 are effectively the same mechanism. The decorator becomes the new default path for model authors.

Deprecation of cog.yaml concurrency field

  • When concurrency is present in cog.yaml, emit a deprecation warning during validation suggesting users move to @cog.concurrent.
  • The field continues to work (no breaking change) — it still generates the COG_MAX_CONCURRENCY env var, which takes precedence over the decorator.

Implementation plan

Python SDK (python/cog/)

  • New file python/cog/concurrent.py with the decorator implementation.
  • Export concurrent from python/cog/__init__.py.

Rust coglet (crates/coglet-python/)

  • In predictor.rs: after loading the predictor, read __cog_concurrent_max__ from the predict function (via getattr).
  • In lib.rs: change read_max_concurrency() to accept the decorator value as a fallback, with env var taking priority.
  • Thread the decorator value from PythonPredictor::load() through to OrchestratorConfig.

Go CLI (pkg/config/)

  • In validate.go: add a deprecation warning when concurrency is present in cog.yaml.

Tests

  • Python unit tests for the decorator (sync rejection, metadata attachment).
  • Rust tests for the new resolution logic (env var > decorator > default).
  • New integration test (.txtar) verifying end-to-end: decorator sets concurrency without cog.yaml field.

Docs

  • Document the @cog.concurrent decorator.
  • Update concurrency docs to recommend the decorator as the primary approach.
  • Regenerate docs/llms.txt.

What does NOT change

  • The PermitPool, slot transport, and orchestrator logic in crates/coglet/ — they just receive num_slots from a different source.
  • Async/sync detection logic in predictor.rs.
  • Existing cog.yaml configs with concurrency.max continue to work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions