Skip to content

Enable live catalog with region-aware fallback engine#789

Open
baijumeswani wants to merge 3 commits into
mainfrom
baijumeswani/catalog
Open

Enable live catalog with region-aware fallback engine#789
baijumeswani wants to merge 3 commits into
mainfrom
baijumeswani/catalog

Conversation

@baijumeswani

@baijumeswani baijumeswani commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Replaces the static, file-based model catalog with a live Azure Foundry catalog client, and makes both catalog queries and model downloads region-aware. The client now detects the caller's Azure region from cluster headers, queries that region's catalog endpoint, and transparently retries against nearby regions when an endpoint is unhealthy. Downloads resolve their SAS URI from the same region that served the model's catalog entry.

This ports three behaviors from the C# core into the C++ SDK:

  1. Fetch the live catalog instead of reading a baked-in static snapshot.
  2. Detect the region from azureml-served-by-cluster headers (instead of always using eastus).
  3. Region-aware catalog query and model download.

How it works

                        ┌──────────────────────────────────────────┐
                        │  AzureModelCatalog.FetchModels()         │
                        └───────────────────┬──────────────────────┘
                                            │
                                            ▼
                        ┌──────────────────────────────────────────┐
                        │  AzureCatalogClient                      │
                        │                                          │
                        │  1. Detect region                        │
                        │     POST probe ─► read                   │
                        │     "azureml-served-by-cluster" header   │
                        │     vienna-<region>-NN ─► region         │
                        │     (explicit config wins; else eastus)  │
                        └───────────────────┬──────────────────────┘
                                            │ for each device filter set
                                            ▼
                        ┌──────────────────────────────────────────┐
                        │  RegionFallback.Execute(startRegion)     │
                        │                                          │
                        │   chain = [start]                        │
                        │         + proximal[start]                │
                        │         + one random public region       │
                        └───────────────────┬──────────────────────┘
                                            │
                       ┌────────────────────┼─────────────────────┐
                       ▼                     ▼                     ▼
                  try region A ─►       try region B ─►       try region C
                   2xx? ✓ done          retryable(0/408/        ...exhausted
                   permanent(404)? ✓     429/5xx)? ─► next       ─► fail set
                   stop                                          (others continue)
                       │
                       │ page 1 pins the serving region;
                       │ later pages stay on it (tokens are region-specific)
                       ▼
            ┌──────────────────────────────────────────┐
            │  Models stamped with detected_region,    │
            │  committed atomically per filter set     │
            └───────────────────┬──────────────────────┘
                                │  (download)
                                ▼
            ┌──────────────────────────────────────────┐
            │  DownloadManager.DownloadModel(info)     │
            │                                          │
            │  region = explicit config                │
            │         → info.detected_region           │
            │         → eastus                         │
            └───────────────────┬──────────────────────┘
                                ▼
            ┌───────────────────────────────────────────┐
            │  ModelRegistryClient.ResolveModelContainer│
            │  (same RegionFallback chain) ─► SAS URI   │
            └───────────────────┬───────────────────────┘
                                ▼
                          Blob download
  • catalog_region: ""/"auto" → detect from headers; any other value → explicit override.
  • DisableRegionFallback: disables cross-region retries (single-region behavior).

Copilot AI review requested due to automatic review settings June 9, 2026 07:06
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Jun 12, 2026 4:21am

Request Review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the C++ SDK’s embedded/static model catalog with a live Azure Foundry catalog client, adds automatic region detection from azureml-served-by-cluster, and introduces a region-fallback engine used by both catalog queries and model-registry SAS resolution so downloads target the region that served the catalog entry.

Changes:

  • Removed the static snapshot catalog + snapshot generation tool, and always build/use the live Azure catalog client.
  • Added RegionFallback + status-aware HTTP response plumbing to support regional retries and sticky-region behavior.
  • Stamped models with detected_region and used it to choose the model registry region when resolving download SAS URIs (with configurable fallback disable).

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sdk_v2/cpp/tools/catalog_snapshot/main.cc Removed the snapshot generation tool entrypoint (static catalog pipeline retired).
sdk_v2/cpp/tools/catalog_snapshot/CMakeLists.txt Removed CMake target for the snapshot tool.
sdk_v2/cpp/test/sdk_api/shared_test_env.h Test env now always points at the live Azure catalog URL (no static path).
sdk_v2/cpp/test/internal_api/static_catalog_test.cc Removed tests that exercised the embedded static catalog client.
sdk_v2/cpp/test/internal_api/region_fallback_test.cc Added unit tests for region fallback chain building, retryability, sticky behavior, and exhaustion.
sdk_v2/cpp/test/internal_api/model_info_test.cc Added JSON round-trip coverage for ModelInfo.detected_region.
sdk_v2/cpp/test/internal_api/download_test.cc Updated registry client tests for status-aware HTTP + fallback; added download region-resolution tests.
sdk_v2/cpp/test/internal_api/catalog_cache_test.cc Added cache save/load coverage for detected_region.
sdk_v2/cpp/test/internal_api/c_api_test.cc Relaxed catalog population assertion to avoid network-dependent expectations.
sdk_v2/cpp/test/internal_api/azure_catalog_test.cc Expanded live catalog client tests to cover region detection, URL rewriting, stamping, and fallback behavior.
sdk_v2/cpp/test/CMakeLists.txt Updated unit test sources: added region_fallback_test, removed static_catalog_test, always includes azure_catalog_test.
sdk_v2/cpp/src/util/region_fallback.h New public/internal utility interface for region-aware retry/fallback execution.
sdk_v2/cpp/src/util/region_fallback.cc New implementation with proximal-region mapping, retryable classification, sticky region tracking, and diagnostics.
sdk_v2/cpp/src/model_info.h Added detected_region field to model metadata.
sdk_v2/cpp/src/model_info.cc Implemented JSON serialization/deserialization for detected_region (omitted when empty).
sdk_v2/cpp/src/manager.cc Defaulted catalog region to auto and plumbed DisableRegionFallback into download + catalog components.
sdk_v2/cpp/src/http/http_client.h Introduced HttpResponse and non-throwing Http*WithResponse APIs (status + headers + body).
sdk_v2/cpp/src/http/http_client.cc Implemented response-aware GET/POST (including header capture + transport failure as status==0).
sdk_v2/cpp/src/download/model_registry_client.h Updated registry client API to accept per-call region and use response-aware HTTP + fallback.
sdk_v2/cpp/src/download/model_registry_client.cc Implemented region-fallback-driven registry resolution and parsing of response bodies.
sdk_v2/cpp/src/download/download_manager.h Added config-region tracking and disable_region_fallback plumbing.
sdk_v2/cpp/src/download/download_manager.cc Resolved registry region via explicit config → detected region → eastus; wired fallback-enabled registry client.
sdk_v2/cpp/src/catalog/static_catalog_client.cc Removed embedded snapshot-backed catalog implementation.
sdk_v2/cpp/src/catalog/live_catalog_client_stub.cc Removed stub that previously threw when live client sources were absent.
sdk_v2/cpp/src/catalog/catalog_snapshot_data.h Removed snapshot data header (embedded bytes no longer part of build).
sdk_v2/cpp/src/catalog/catalog_client.h Updated contract/docs: MakeCatalogClient now always constructs live Azure client and accepts region/fallback controls.
sdk_v2/cpp/src/catalog/catalog_client.cc Retained cached-model + BYO synthesis helper; removed static-vs-live dispatch logic.
sdk_v2/cpp/src/catalog/azure_model_catalog.h Default catalog URL is now always Azure; added region + fallback configuration fields.
sdk_v2/cpp/src/catalog/azure_model_catalog.cc Plumbed region/fallback into MakeCatalogClient and persisted fetched model infos (incl. region) to cache.
sdk_v2/cpp/src/catalog/azure_catalog_client.h New live Azure catalog client interface with region detection and fallback-aware fetching.
sdk_v2/cpp/src/catalog/azure_catalog_client.cc New live Azure catalog implementation: request shaping, region detection, URL rewriting, filter-set atomicity, and fallback.
sdk_v2/cpp/CMakeLists.txt Removed tools option + static snapshot wiring; always builds live catalog + region fallback sources.

Comment thread sdk_v2/cpp/src/download/model_registry_client.cc Outdated
Comment thread sdk_v2/cpp/src/download/model_registry_client.cc Outdated
Comment thread sdk_v2/cpp/src/catalog/azure_catalog_client.cc Outdated
Comment thread sdk_v2/cpp/src/util/region_fallback.cc Outdated
Comment thread sdk_v2/cpp/src/manager.cc
const auto fallback_it = config_.additional_options.find("DisableRegionFallback");
if (fallback_it != config_.additional_options.cend()) {
const auto value = to_lower(fallback_it->second);
disable_region_fallback = (value == "true" || value == "1" || value == "yes");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to change in this PR, but no other parameter that gets parsed from the config accepts "yes". I think it's a good idea to keep here, but maybe we should standardize what truthy-values get accepted with a helper function somewhere

const std::string& json_body,
const std::string& user_agent,
std::chrono::milliseconds timeout,
bool close_connection) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe consider changing the order of close_connection and timeout here and in HttpGetWithResponse to align more with HttpRequestRaw

Or, maybe there should be an HttpRequestOptions struct that gets passed in to store user_agent, timeout, and close_connection? Maybe it's fine as is if we don't anticipate any more arguments being added in the future.

/// `status == 0` becomes "transport failure"; otherwise "HTTP <status>". When the body is
/// non-empty it is appended (truncated to `max_body_chars`) so server-side or transport
/// diagnostics are preserved without bloating logs.
std::string DescribeFailure(const HttpResponse& response, std::size_t max_body_chars = 512);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense for consumers; I will need to log ErrorEvent in here when telemetry is added instead of logging later and missing out on some information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants