Configure concurrency in predictor rather than in cog.yaml

It always felt a bit funny to me that concurrency was declared in cog.yaml, when it felt like a thing that was associated with the predictor. It's the `async` on `predict()` that marks whether a model can be concurrent, then max concurrency is sort like a default concurrency for that function.

This becomes more clear when we consider `train`, or other arbitrary functions if that's ever a thing. It's each function that has a concurrency, not the model as a whole.

Anyway, here's what Claude thinks it could look be if it was a decorator...

## Proposed API

```python
import cog
from cog import BasePredictor, Input

# Class-based predictor
class Predictor(BasePredictor):
    def setup(self):
        self.model = load_model()

    @cog.concurrent(max=5)
    async def predict(self, prompt: str = Input()) -> str:
        return await self.model.generate(prompt)

# Function-based predictor
@cog.concurrent(max=5)
async def predict(prompt: str = Input()) -> str:
    return await model.generate(prompt)
```

## Design

### Decorator behavior
- `@cog.concurrent(max=N)` attaches `__cog_concurrent_max__ = N` metadata to the decorated function.
- Applying `@cog.concurrent(max=N)` where `N > 1` to a synchronous function raises `TypeError` at decoration time.
- `@cog.concurrent` (bare, without arguments) is also supported and defaults to `max=1`.

### Precedence (highest to lowest)
1. `COG_MAX_CONCURRENCY` environment variable at runtime (operator override)
2. `cog.yaml` `concurrency.max` → baked into Dockerfile as `COG_MAX_CONCURRENCY` (deprecated, see below)
3. `@cog.concurrent(max=N)` decorator on the predict function
4. Default: `1`

Since `cog.yaml` `concurrency.max` works by setting the env var at build time, cases 1 and 2 are effectively the same mechanism. The decorator becomes the new default path for model authors.

### Deprecation of `cog.yaml` `concurrency` field
- When `concurrency` is present in `cog.yaml`, emit a deprecation warning during validation suggesting users move to `@cog.concurrent`.
- The field continues to work (no breaking change) — it still generates the `COG_MAX_CONCURRENCY` env var, which takes precedence over the decorator.

## Implementation plan

### Python SDK (`python/cog/`)
- New file `python/cog/concurrent.py` with the decorator implementation.
- Export `concurrent` from `python/cog/__init__.py`.

### Rust coglet (`crates/coglet-python/`)
- In `predictor.rs`: after loading the predictor, read `__cog_concurrent_max__` from the predict function (via `getattr`).
- In `lib.rs`: change `read_max_concurrency()` to accept the decorator value as a fallback, with env var taking priority.
- Thread the decorator value from `PythonPredictor::load()` through to `OrchestratorConfig`.

### Go CLI (`pkg/config/`)
- In `validate.go`: add a deprecation warning when `concurrency` is present in `cog.yaml`.

### Tests
- Python unit tests for the decorator (sync rejection, metadata attachment).
- Rust tests for the new resolution logic (env var > decorator > default).
- New integration test (`.txtar`) verifying end-to-end: decorator sets concurrency without `cog.yaml` field.

### Docs
- Document the `@cog.concurrent` decorator.
- Update concurrency docs to recommend the decorator as the primary approach.
- Regenerate `docs/llms.txt`.

## What does NOT change
- The `PermitPool`, slot transport, and orchestrator logic in `crates/coglet/` — they just receive `num_slots` from a different source.
- Async/sync detection logic in `predictor.rs`.
- Existing `cog.yaml` configs with `concurrency.max` continue to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure concurrency in predictor rather than in cog.yaml #2811

Proposed API

Design

Decorator behavior

Precedence (highest to lowest)

Deprecation of `cog.yaml` `concurrency` field

Implementation plan

Python SDK (`python/cog/`)

Rust coglet (`crates/coglet-python/`)

Go CLI (`pkg/config/`)

Tests

Docs

What does NOT change

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Configure concurrency in predictor rather than in cog.yaml #2811

Description

Proposed API

Design

Decorator behavior

Precedence (highest to lowest)

Deprecation of cog.yaml concurrency field

Implementation plan

Python SDK (python/cog/)

Rust coglet (crates/coglet-python/)

Go CLI (pkg/config/)

Tests

Docs

What does NOT change

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Deprecation of `cog.yaml` `concurrency` field

Python SDK (`python/cog/`)

Rust coglet (`crates/coglet-python/`)

Go CLI (`pkg/config/`)