Skip to content

fix(catalogs): validate extension and preset catalog payload shape#2621

Open
Quratulain-bilal wants to merge 1 commit into
github:mainfrom
Quratulain-bilal:fix/catalog-fetch-validates-payload-type
Open

fix(catalogs): validate extension and preset catalog payload shape#2621
Quratulain-bilal wants to merge 1 commit into
github:mainfrom
Quratulain-bilal:fix/catalog-fetch-validates-payload-type

Conversation

@Quratulain-bilal
Copy link
Copy Markdown
Contributor

What

ExtensionCatalog._fetch_single_catalog and PresetCatalog._fetch_single_catalog only check that the extensions /
presets key is present in the parsed catalog JSON. They don't check that the value is a JSON object, and they don't
check that the root is a JSON object at all. A malformed (or compromised) upstream catalog returning:

{"schema_version": "1.0", "extensions": []}

passes both the "extensions" not in catalog_data check and the JSON parse, gets cached on disk, and then crashes deep
inside _get_merged_extensions (resp. _get_merged_packs) with:

AttributeError: 'list' object has no attribute 'items'

instead of the existing user-facing ExtensionError("Invalid catalog format from <url>") / PresetError("Invalid preset catalog format") that the surrounding code is clearly trying to produce.

Reproducer

from specify_cli.extensions import ExtensionCatalog
# point an ExtensionCatalog at a URL that returns:
#   {"schema_version": "1.0", "extensions": []}
catalog._get_merged_extensions(force_refresh=True)
# AttributeError: 'list' object has no attribute 'items'

Why this matters

  • The error surfaces as a raw Python traceback (AttributeError) instead of the existing ExtensionError /
    PresetError. Users see a Python stack trace and assume the CLI itself is broken, not their catalog.
  • A multi-catalog stack (SPECKIT_CATALOG_URL env var, project-level extension-catalogs.yml) can include
    community-hosted catalogs. A single misedited entry on any catalog in the stack crashes the whole resolve path.
  • The sibling integration-catalog reader at src/specify_cli/integrations/catalog.py:175,186 already validates both the
    root object and the nested mapping — the three catalog fetchers had drifted out of sync.

The change

  • src/specify_cli/extensions.py — add isinstance(catalog_data, dict) and isinstance(catalog_data.get("extensions"), dict) guards in _fetch_single_catalog before iteration, raising the existing ExtensionError("Invalid catalog format from ...") with a more specific suffix when the type is wrong.
  • src/specify_cli/presets.py — same guards in _fetch_single_catalog, raising PresetError.
  • Both call sites mirror the existing integrations/catalog.py pattern exactly, with a short comment explaining the
    precedent so the rationale is visible to the next reader.

Tests

Parametrized regression tests in both tests/test_extensions.py::TestExtensionCatalog and
tests/test_presets.py::TestPresetCatalog cover:

  • root payload is not a JSON object: [], "oops", 42, null
  • root is a dict but extensions / presets value is the wrong type: [], "oops", null, 42

All 8 bad-payload shapes per file (16 total) now raise the expected catalog error.

$ python -m pytest tests/test_extensions.py::TestExtensionCatalog tests/test_extensions.py::TestCatalogStack
tests/test_presets.py::TestPresetCatalog -q
83 passed in 4.63s
$ python -m ruff check src/
All checks passed!

All checks passed!


## Scope

Intentionally narrow: no behaviour change for any valid catalog (real `dict` payloads continue to work). Only malformed
payloads that previously crashed with `AttributeError` now raise the existing `ExtensionError` / `PresetError` that the
surrounding code is already designed for.

`ExtensionCatalog._fetch_single_catalog` and
`PresetCatalog._fetch_single_catalog` only check that the `extensions` /
`presets` key is *present* in the parsed catalog JSON. They don't check
that the value is a JSON object, and they don't check that the root is
a JSON object at all. A malformed (or compromised) upstream catalog
returning:

    {"schema_version": "1.0", "extensions": []}

passes both `"extensions" not in catalog_data` and the subsequent
`response.read()` JSON parse, gets cached on disk, and then crashes
deep inside `_get_merged_extensions` (resp. `_get_merged_packs`) with:

    AttributeError: 'list' object has no attribute 'items'

instead of the existing user-facing
`ExtensionError("Invalid catalog format from <url>")` /
`PresetError("Invalid preset catalog format")` that the surrounding
code is clearly trying to produce.

The sibling integration-catalog reader already validates this — see
`src/specify_cli/integrations/catalog.py` where the fetch path
explicitly checks both `isinstance(catalog_data, dict)` and
`isinstance(catalog_data.get("integrations"), dict)` before returning.
This change mirrors that pattern in the extension and preset readers so
the three catalog fetchers stay consistent and a malformed upstream
surfaces as the user-facing error instead of a raw Python traceback.

Adds parametrized regression tests covering:
- root payload is not a JSON object (list, str, int, null)
- root is a dict but `extensions` / `presets` value is the wrong type
  (list, str, null, int)

All eight bad-payload shapes now raise the expected catalog error.
@Quratulain-bilal Quratulain-bilal requested a review from mnriem as a code owner May 18, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant