diff --git a/docs/guides/go-sdk/index.md b/docs/guides/go-sdk/index.md
index 2e8f8a93e..9f3afbf14 100644
--- a/docs/guides/go-sdk/index.md
+++ b/docs/guides/go-sdk/index.md
@@ -289,6 +289,55 @@ func createAgentWithBuiltinTools(llm provider.Provider) *agent.Agent {
}
```
+## HTTP Middleware / Transport Wrappers
+
+Use `options.WithHTTPTransportWrapper` to inject HTTP middleware into the transport chain of all provider clients built by docker-agent. This is useful for request tracing, injecting custom headers, collecting metrics, or any other cross-cutting concern at the HTTP layer.
+
+```go
+import (
+ "net/http"
+
+ "github.com/docker/docker-agent/pkg/model/provider/options"
+)
+
+type headerTransport struct {
+ base http.RoundTripper
+}
+
+func (t *headerTransport) RoundTrip(req *http.Request) (*http.Response, error) {
+ req = req.Clone(req.Context())
+ req.Header.Set("X-Request-Source", "my-app")
+ return t.base.RoundTrip(req)
+}
+
+// Example: add a custom header to every outbound LLM request
+wrapper := options.WithHTTPTransportWrapper(
+ func(base http.RoundTripper) http.RoundTripper {
+ return &headerTransport{base: base}
+ },
+)
+
+client, err := openai.NewClient(ctx, &latest.ModelConfig{
+ Provider: "openai",
+ Model: "gpt-4o",
+}, env, wrapper)
+```
+
+The wrapper receives the already-instrumented transport (OpenTelemetry, SSE decompression, Desktop proxy support) as its `base` argument, so wrapping it preserves all built-in behaviour.
+
+**Supported providers:** Anthropic, OpenAI, Gemini (GeminiAPI backend), Bedrock. Works in both direct and gateway/proxy mode.
+
+
+
Vertex AI not supported
+
+
Vertex AI uses an ADC-managed HTTP client that docker-agent cannot intercept. When a transport wrapper is set, docker-agent falls back to the GeminiAPI backend instead of Vertex AI — a debug message is logged.
+
+
+
+In **gateway mode** the wrapper is called on every LLM request because gateway clients are rebuilt each call for short-lived auth tokens. In **direct mode** it is called once at client construction. Rate-limit responses (HTTP 429) are classified as non-retryable by the runtime and cause the model chain to skip to the next fallback, so wrappers that track per-request outcomes will observe these as failures rather than retried calls.
+
+Returning `nil` from your wrapper function is not allowed; docker-agent logs a warning and keeps the original transport instead.
+
## Using Different Providers
```go
diff --git a/docs/tools/rag/index.md b/docs/tools/rag/index.md
index ccc5af3d6..35576f9d4 100644
--- a/docs/tools/rag/index.md
+++ b/docs/tools/rag/index.md
@@ -183,6 +183,11 @@ $ docker agent run config.yaml --debug --log-file debug.log
Look for log tags: `[RAG Manager]`, `[Chunked-Embeddings Strategy]`, `[BM25 Strategy]`, `[RRF Fusion]`, `[Reranker]`.
+**Permanent model errors abort early.** If the embedding model, semantic-LLM model, or reranking model returns a permanent error (HTTP 400, 401, 404, or 429 — invalid config, bad auth, unknown model, or rate limit), docker-agent treats the model configuration as invalid and stops immediately rather than retrying doomed requests:
+
+- **Indexing** — the entire indexing run is aborted after the first permanent failure (including 429). The error is surfaced in the logs so you know immediately if a model name or API key is wrong, rather than silently producing incomplete results.
+- **Reranking** — a permanent error (including 429) permanently disables the reranker for the lifetime of the manager. Subsequent queries fall back to un-reranked results. Only transient errors (5xx, timeouts) fall back and retry on the next query.
+