Skip to content

fix(retrievers): propagate non-404 HTTP errors from embedding download loop#1200

Merged
planetf1 merged 3 commits into
generative-computing:mainfrom
planetf1:fix/hf-production-retry
Jun 4, 2026
Merged

fix(retrievers): propagate non-404 HTTP errors from embedding download loop#1200
planetf1 merged 3 commits into
generative-computing:mainfrom
planetf1:fix/hf-production-retry

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Jun 4, 2026

Summary

The embedding-parts download loop in download_mtrag_embeddings used a bare except urllib.error.HTTPError: break to detect end-of-parts. Since HTTPError covers all HTTP status codes, a transient 429 or 5xx was silently treated as end-of-data — the download loop would stop early and return a truncated (or empty) embedding set with no error raised.

Fix: break only on 404 (the genuine end-of-parts sentinel); re-raise everything else so the caller can decide how to handle it. One-line logic change — no retry, no sleep, no new helpers. Error handling is the caller's responsibility.

Testing

Unit tests in test/formatters/granite/test_retrievers_util.py (all mocked, no network I/O):

  • download_mtrag_corpus: invalid name raises, skips download if file exists, downloads when missing, propagates HTTP errors

  • download_mtrag_embeddings: 404 on first part raises ValueError, stops correctly after 404 on subsequent part, propagates 429/500/502/503/504 — parametrized

  • uv run pytest test/formatters/granite/test_retrievers_util.py -v passes (11/11)

  • CI green

Closes #1198

@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
The embedding-parts download loop in download_mtrag_embeddings caught
urllib.error.HTTPError to detect end-of-parts (404 = no more files).
Since HTTPError covers all HTTP errors, a transient 429 or 5xx was
treated as end-of-data, silently producing a truncated embedding set
with no error raised.

Fix: only break on 404; all other HTTP errors now propagate. Also adds
_urlretrieve_with_retry to retry transient errors (429, 5xx) with
exponential back-off before propagating.

Closes generative-computing#1198

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the fix/hf-production-retry branch from f0c2496 to 058ced3 Compare June 4, 2026 08:47
@planetf1 planetf1 changed the title fix(backends): retry with backoff for HuggingFace Hub network calls + embedding loop correctness fix(retrievers): fix 429 silently truncating MTRAG embedding downloads Jun 4, 2026
@planetf1 planetf1 requested a review from ajbozarth June 4, 2026 10:32
@planetf1 planetf1 added the area/backends Provider-specific work: Ollama, HF, LiteLLM, OpenAI, Bedrock, vLLM label Jun 4, 2026
@planetf1 planetf1 requested a review from jakelorocco June 4, 2026 10:32
@planetf1 planetf1 marked this pull request as ready for review June 4, 2026 10:32
@planetf1 planetf1 requested a review from a team as a code owner June 4, 2026 10:32
planetf1 added 2 commits June 4, 2026 11:39
Covers success, per-retryable-code retry, non-retryable 404 fast-fail,
exhausted retries, and max_attempts=1 — all mocked so no network I/O.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
… code

Replace _urlretrieve_with_retry (which embedded application-level retry/sleep
logic) with direct urllib.request.urlretrieve calls. Error handling is the
caller's responsibility per project conventions.

The only behaviour change: download_mtrag_embeddings now breaks only on 404
(end-of-parts signal) and propagates all other HTTP errors (429, 5xx) rather
than silently treating them as end-of-data and returning a truncated result.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 changed the title fix(retrievers): fix 429 silently truncating MTRAG embedding downloads fix(retrievers): propagate non-404 HTTP errors from embedding download loop Jun 4, 2026
@planetf1 planetf1 enabled auto-merge June 4, 2026 13:03
@planetf1 planetf1 added this pull request to the merge queue Jun 4, 2026
Merged via the queue into generative-computing:main with commit 1e56745 Jun 4, 2026
13 checks passed
@planetf1 planetf1 deleted the fix/hf-production-retry branch June 4, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backends Provider-specific work: Ollama, HF, LiteLLM, OpenAI, Bedrock, vLLM bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: HuggingFace I/O has no retry or graceful degradation on transient failures (429 / network errors)

2 participants