Skip to content
49 changes: 49 additions & 0 deletions docs/guides/go-sdk/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,55 @@ func createAgentWithBuiltinTools(llm provider.Provider) *agent.Agent {
}
```

## HTTP Middleware / Transport Wrappers

Use `options.WithHTTPTransportWrapper` to inject HTTP middleware into the transport chain of all provider clients built by docker-agent. This is useful for request tracing, injecting custom headers, collecting metrics, or any other cross-cutting concern at the HTTP layer.

```go
import (
"net/http"

"github.com/docker/docker-agent/pkg/model/provider/options"
)

type headerTransport struct {
base http.RoundTripper
}

func (t *headerTransport) RoundTrip(req *http.Request) (*http.Response, error) {
req = req.Clone(req.Context())
req.Header.Set("X-Request-Source", "my-app")
return t.base.RoundTrip(req)
}

// Example: add a custom header to every outbound LLM request
wrapper := options.WithHTTPTransportWrapper(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] options.WithHTTPTransportWrapper does not exist in the codebase

The documented function options.WithHTTPTransportWrapper has no implementation anywhere in the repository. A repo-wide search of all .go files returns zero results for this symbol, and pkg/model/provider/options/options.go contains no transport-related functions — only WithGateway, WithStructuredOutput, WithGeneratingTitle, WithMaxTokens, WithNoThinking, WithProviders, and WithModelsDevStore.

Documenting a non-existent API will mislead users into writing code that will not compile. This section should be held until the corresponding implementation is merged, or the docs should reference the correct function name if it was renamed.

func(base http.RoundTripper) http.RoundTripper {
return &headerTransport{base: base}
},
)

client, err := openai.NewClient(ctx, &latest.ModelConfig{
Provider: "openai",
Model: "gpt-4o",
}, env, wrapper)
```

The wrapper receives the already-instrumented transport (OpenTelemetry, SSE decompression, Desktop proxy support) as its `base` argument, so wrapping it preserves all built-in behaviour.

**Supported providers:** Anthropic, OpenAI, Gemini (GeminiAPI backend), Bedrock. Works in both direct and gateway/proxy mode.

<div class="callout callout-warning" markdown="1">
<div class="callout-title">Vertex AI not supported
</div>
<p>Vertex AI uses an ADC-managed HTTP client that docker-agent cannot intercept. When a transport wrapper is set, docker-agent falls back to the GeminiAPI backend instead of Vertex AI — a debug message is logged.</p>

</div>

In **gateway mode** the wrapper is called on every LLM request because gateway clients are rebuilt each call for short-lived auth tokens. In **direct mode** it is called once at client construction. Rate-limit responses (HTTP 429) are classified as non-retryable by the runtime and cause the model chain to skip to the next fallback, so wrappers that track per-request outcomes will observe these as failures rather than retried calls.

Returning `nil` from your wrapper function is not allowed; docker-agent logs a warning and keeps the original transport instead.

## Using Different Providers

```go
Expand Down
5 changes: 5 additions & 0 deletions docs/tools/rag/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,11 @@ $ docker agent run config.yaml --debug --log-file debug.log

Look for log tags: `[RAG Manager]`, `[Chunked-Embeddings Strategy]`, `[BM25 Strategy]`, `[RRF Fusion]`, `[Reranker]`.

**Permanent model errors abort early.** If the embedding model, semantic-LLM model, or reranking model returns a permanent error (HTTP 400, 401, 404, or 429 — invalid config, bad auth, unknown model, or rate limit), docker-agent treats the model configuration as invalid and stops immediately rather than retrying doomed requests:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] HTTP 429 misclassified as a permanent/fatal configuration error

The paragraph groups HTTP 429 ("Too Many Requests") together with 400, 401, and 404 and describes it as indicating an "invalid config, bad auth, or unknown model". However 429 is a transient rate-limit signal, not evidence of a misconfigured model. Standard HTTP semantics (RFC 6585) define 429 as retriable (often with a Retry-After header).

The current text says:

  • Indexing: aborted entirely on a 429 — user loses all partially-indexed content with no retry
  • Reranking: reranker permanently disabled for the manager's lifetime on a single 429 — user must restart the agent to restore reranking

This is doubly alarming because the PR description says PR #3091 was motivated by preventing "a flood of doomed requests" from misconfigured models (wrong model name, bad API key). A transient 429 from a correctly-configured model under load is not a "doomed" scenario.

The file history confirms the issue: commit 9448b75e is titled "fix: 429 misclassified as permanent error in RAG docs (#3099)" — suggesting a prior version of this PR already attempted to correct this, but the fix may not have made it into the current diff.

Suggestion: Remove 429 from the permanent-error list and describe it separately as a transient error that triggers backoff/retry (consistent with 5xx treatment), unless the implementation genuinely treats a 429 as a fatal permanent failure (in which case the PR description and the prior fix commit are misleading and the behaviour itself should be reconsidered).


- **Indexing** — the entire indexing run is aborted after the first permanent failure (including 429). The error is surfaced in the logs so you know immediately if a model name or API key is wrong, rather than silently producing incomplete results.
- **Reranking** — a permanent error (including 429) permanently disables the reranker for the lifetime of the manager. Subsequent queries fall back to un-reranked results. Only transient errors (5xx, timeouts) fall back and retry on the next query.

<div class="callout callout-tip" markdown="1">
<div class="callout-title">Examples
</div>
Expand Down
Loading