docs: document WithHTTPTransportWrapper and RAG permanent error behavior#3099
Conversation
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Documentation-only PR adding two new sections. One medium-severity accuracy issue found in the RAG permanent error documentation — HTTP 429 (rate limiting) is grouped with fatal configuration errors (bad auth, unknown model), which contradicts standard HTTP semantics and the PR's own stated motivation. A prior fix commit in the file history suggests this was already flagged once. One low-severity clarity note on the Go SDK code snippet.
|
|
||
| Look for log tags: `[RAG Manager]`, `[Chunked-Embeddings Strategy]`, `[BM25 Strategy]`, `[RRF Fusion]`, `[Reranker]`. | ||
|
|
||
| **Permanent model errors abort early.** If the embedding model, semantic-LLM model, or reranking model returns a permanent error (HTTP 400, 401, 404, or 429 — invalid config, bad auth, unknown model, or rate limit), docker-agent treats the model configuration as invalid and stops immediately rather than retrying doomed requests: |
There was a problem hiding this comment.
[MEDIUM] HTTP 429 misclassified as a permanent/fatal configuration error
The paragraph groups HTTP 429 ("Too Many Requests") together with 400, 401, and 404 and describes it as indicating an "invalid config, bad auth, or unknown model". However 429 is a transient rate-limit signal, not evidence of a misconfigured model. Standard HTTP semantics (RFC 6585) define 429 as retriable (often with a Retry-After header).
The current text says:
- Indexing: aborted entirely on a 429 — user loses all partially-indexed content with no retry
- Reranking: reranker permanently disabled for the manager's lifetime on a single 429 — user must restart the agent to restore reranking
This is doubly alarming because the PR description says PR #3091 was motivated by preventing "a flood of doomed requests" from misconfigured models (wrong model name, bad API key). A transient 429 from a correctly-configured model under load is not a "doomed" scenario.
The file history confirms the issue: commit 9448b75e is titled "fix: 429 misclassified as permanent error in RAG docs (#3099)" — suggesting a prior version of this PR already attempted to correct this, but the fix may not have made it into the current diff.
Suggestion: Remove 429 from the permanent-error list and describe it separately as a transient error that triggers backoff/retry (consistent with 5xx treatment), unless the implementation genuinely treats a 429 as a fatal permanent failure (in which case the PR description and the prior fix commit are misleading and the behaviour itself should be reconsidered).
| }, | ||
| ) | ||
|
|
||
| client, err := openai.NewClient(ctx, modelCfg, env, wrapper) |
There was a problem hiding this comment.
[LOW] Code snippet references modelCfg without definition or type annotation
The snippet ends with:
client, err := openai.NewClient(ctx, modelCfg, env, wrapper)modelCfg is not declared anywhere in the snippet, and its type (*latest.ModelConfig) is not annotated. Readers following along from the Basic Example section may not know how to obtain or construct one.
Suggestion: Either add a brief inline comment such as // modelCfg is a *latest.ModelConfig — see the Basic Example section above or include a minimal declaration like:
modelCfg := &latest.ModelConfig{Model: "gpt-4o"}The earlier openai.NewClient call at line 83 uses an inline struct literal which makes its type self-evident; this snippet could follow the same pattern.
Documentation updates
This PR documents two code changes merged into `main` in the last 36 hours that were not reflected in the `/docs`.
Details
Go SDK HTTP Middleware / Transport Wrappers (#3090)
PR #3090 added `options.WithHTTPTransportWrapper` — a new `Opt` that lets Go SDK embedders inject an HTTP middleware (e.g. tracing, custom auth headers, metrics) into the transport chain of all provider clients. No documentation existed for this API.
Added a new "HTTP Middleware / Transport Wrappers" section to `docs/guides/go-sdk/index.md` covering:
RAG permanent error behavior (#3091)
PR #3091 fixed a bug where a misconfigured embedding or reranking model (e.g. a typo in the model name) caused a flood of doomed requests. The new behavior — aborting indexing on the first permanent failure and permanently disabling the reranker on permanent errors — was not documented.
Added a paragraph to the "Debugging RAG" section of `docs/tools/rag/index.md` explaining:
PRs reviewed and found up to date