diff --git a/docs/guides/go-sdk/index.md b/docs/guides/go-sdk/index.md index 2e8f8a93e..9f3afbf14 100644 --- a/docs/guides/go-sdk/index.md +++ b/docs/guides/go-sdk/index.md @@ -289,6 +289,55 @@ func createAgentWithBuiltinTools(llm provider.Provider) *agent.Agent { } ``` +## HTTP Middleware / Transport Wrappers + +Use `options.WithHTTPTransportWrapper` to inject HTTP middleware into the transport chain of all provider clients built by docker-agent. This is useful for request tracing, injecting custom headers, collecting metrics, or any other cross-cutting concern at the HTTP layer. + +```go +import ( + "net/http" + + "github.com/docker/docker-agent/pkg/model/provider/options" +) + +type headerTransport struct { + base http.RoundTripper +} + +func (t *headerTransport) RoundTrip(req *http.Request) (*http.Response, error) { + req = req.Clone(req.Context()) + req.Header.Set("X-Request-Source", "my-app") + return t.base.RoundTrip(req) +} + +// Example: add a custom header to every outbound LLM request +wrapper := options.WithHTTPTransportWrapper( + func(base http.RoundTripper) http.RoundTripper { + return &headerTransport{base: base} + }, +) + +client, err := openai.NewClient(ctx, &latest.ModelConfig{ + Provider: "openai", + Model: "gpt-4o", +}, env, wrapper) +``` + +The wrapper receives the already-instrumented transport (OpenTelemetry, SSE decompression, Desktop proxy support) as its `base` argument, so wrapping it preserves all built-in behaviour. + +**Supported providers:** Anthropic, OpenAI, Gemini (GeminiAPI backend), Bedrock. Works in both direct and gateway/proxy mode. + +
+
Vertex AI not supported +
+

Vertex AI uses an ADC-managed HTTP client that docker-agent cannot intercept. When a transport wrapper is set, docker-agent falls back to the GeminiAPI backend instead of Vertex AI — a debug message is logged.

+ +
+ +In **gateway mode** the wrapper is called on every LLM request because gateway clients are rebuilt each call for short-lived auth tokens. In **direct mode** it is called once at client construction. Rate-limit responses (HTTP 429) are classified as non-retryable by the runtime and cause the model chain to skip to the next fallback, so wrappers that track per-request outcomes will observe these as failures rather than retried calls. + +Returning `nil` from your wrapper function is not allowed; docker-agent logs a warning and keeps the original transport instead. + ## Using Different Providers ```go diff --git a/docs/tools/rag/index.md b/docs/tools/rag/index.md index ccc5af3d6..35576f9d4 100644 --- a/docs/tools/rag/index.md +++ b/docs/tools/rag/index.md @@ -183,6 +183,11 @@ $ docker agent run config.yaml --debug --log-file debug.log Look for log tags: `[RAG Manager]`, `[Chunked-Embeddings Strategy]`, `[BM25 Strategy]`, `[RRF Fusion]`, `[Reranker]`. +**Permanent model errors abort early.** If the embedding model, semantic-LLM model, or reranking model returns a permanent error (HTTP 400, 401, 404, or 429 — invalid config, bad auth, unknown model, or rate limit), docker-agent treats the model configuration as invalid and stops immediately rather than retrying doomed requests: + +- **Indexing** — the entire indexing run is aborted after the first permanent failure (including 429). The error is surfaced in the logs so you know immediately if a model name or API key is wrong, rather than silently producing incomplete results. +- **Reranking** — a permanent error (including 429) permanently disables the reranker for the lifetime of the manager. Subsequent queries fall back to un-reranked results. Only transient errors (5xx, timeouts) fall back and retry on the next query. +
Examples