Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
54669d2
removed NS branch dependency
alexandraBara May 28, 2026
0342522
updates
alexandraBara May 28, 2026
958d4bf
updates
alexandraBara May 28, 2026
8fccd06
updates
alexandraBara May 28, 2026
9377b62
updating with details on which GPU has issues
alexandraBara May 28, 2026
f1ad01f
Merge branch 'development' into alex_amdsmi_update
alexandraBara May 29, 2026
0b727fb
registering pluginrecipes with entrypoints
alexandraBara May 29, 2026
591ac90
pytest fix
alexandraBara May 29, 2026
99d8314
updates to allow for partial plugin runs in pre-set plugin configs li…
alexandraBara May 29, 2026
85fa190
catching failure + print
alexandraBara May 29, 2026
46f38a2
Merge pull request #205 from amd/alex_amdsmi_update
alexandraBara Jun 1, 2026
92f76eb
Merge branch 'development' into alex_recipes
alexandraBara Jun 1, 2026
23c5e8e
using PluginConfig for stronger typing
alexandraBara Jun 1, 2026
3b979ea
some updates
alexandraBara Jun 1, 2026
74a4466
test fix
alexandraBara Jun 1, 2026
c64d17a
Merge pull request #206 from amd/alex_recipes
alexandraBara Jun 1, 2026
f233237
docs: Update plugin documentation [automated]
github-actions[bot] Jun 2, 2026
f4a88ae
added author to pyproject
alexandraBara Jun 2, 2026
9f14e29
Merge branch 'development' into automated-plugin-docs-update
alexandraBara Jun 2, 2026
84babb7
Bump setuptools from 70.3.0 to 78.1.1
dependabot[bot] Jun 2, 2026
798fd86
Merge pull request #207 from amd/automated-plugin-docs-update
alexandraBara Jun 2, 2026
be28a9a
Merge branch 'development' into dependabot/pip/setuptools-78.1.1
alexandraBara Jun 2, 2026
63371b7
Merge pull request #209 from amd/dependabot/pip/setuptools-78.1.1
alexandraBara Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ options:
Comma-separated built-in names and/or plugin config
JSON paths (e.g. --plugin-
configs=NodeStatus,/path/c.json). Built-ins:
NodeStatus, AllPlugins (default: None)
AllPlugins, NodeStatus (default: None)
--system-config STRING
Path to system config json (default: None)
--connection-config STRING
Expand Down
3 changes: 2 additions & 1 deletion docs/PLUGIN_DOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

| Plugin | Collection | Analyzer Args | Collection Args | DataModel | Collector | Analyzer |
| --- | --- | --- | --- | --- | --- | --- |
| AmdSmiPlugin | bad-pages<br>firmware --json<br>list --json<br>metric -g all<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>ras --afid --cper-file {cper_file}<br>static -g all --json<br>static -g {gpu_id} --json<br>topology<br>version --json<br>xgmi -l<br>xgmi -m | **Analyzer Args:**<br>- `check_static_data`: bool — If True, run static data checks (e.g. driver version, partition mode).<br>- `expected_gpu_processes`: Optional[int] — Expected number of GPU processes.<br>- `expected_max_power`: Optional[int] — Expected maximum power value (e.g. watts).<br>- `expected_driver_version`: Optional[str] — Expected AMD driver version string.<br>- `expected_memory_partition_mode`: Optional[str] — Expected memory partition mode (e.g. sp3, dp).<br>- `expected_compute_partition_mode`: Optional[str] — Expected compute partition mode.<br>- `expected_firmware_versions`: Optional[dict[str, str]] — Expected firmware versions keyed by amd-smi fw_id (e.g. PLDM_BUNDLE).<br>- `l0_to_recovery_count_error_threshold`: Optional[int] — L0-to-recovery count above which an error is raised.<br>- `l0_to_recovery_count_warning_threshold`: Optional[int] — L0-to-recovery count above which a warning is raised.<br>- `vendorid_ep`: Optional[str] — Expected endpoint vendor ID (e.g. for PCIe).<br>- `vendorid_ep_vf`: Optional[str] — Expected endpoint VF vendor ID.<br>- `devid_ep`: Optional[str] — Expected endpoint device ID.<br>- `devid_ep_vf`: Optional[str] — Expected endpoint VF device ID.<br>- `sku_name`: Optional[str] — Expected SKU name string for GPU.<br>- `expected_xgmi_speed`: Optional[list[float]] — Expected xGMI speed value(s) (e.g. link rate).<br>- `analysis_range_start`: Optional[datetime.datetime] — Start of time range for time-windowed analysis.<br>- `analysis_range_end`: Optional[datetime.datetime] — End of time range for time-windowed analysis. | **Collection Args:**<br>- `analysis_firmware_ids`: Optional[list[str]] — amd-smi fw_id values to record in analysis_ref.firmware_versions<br>- `cper_file_path`: Optional[str] — Path to CPER folder or file for RAS AFID collection (ras --afid --cper-file). | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
| AmdSmiPlugin | bad-pages<br>firmware --json<br>list --json<br>metric -g all<br>partition --json<br>process --json<br>ras --cper --folder={folder}<br>ras --afid --cper-file {cper_file}<br>static -g all --json<br>static -g {gpu_id} --json<br>topology<br>version --json<br>xgmi -l<br>xgmi -m | **Analyzer Args:**<br>- `check_static_data`: bool — If True, run static data checks (e.g. driver version, partition mode).<br>- `expected_gpu_processes`: Optional[int] — Expected number of GPU processes.<br>- `expected_max_power`: Optional[int] — Expected maximum power value (e.g. watts).<br>- `expected_power_management`: Optional[str] — Expected amd-smi metric power_management value per GPU (e.g. DISABLED for active/full power, ENABLED for power-manage...<br>- `expected_driver_version`: Optional[str] — Expected AMD driver version string.<br>- `expected_memory_partition_mode`: Optional[str] — Expected memory partition mode (e.g. sp3, dp).<br>- `expected_compute_partition_mode`: Optional[str] — Expected compute partition mode.<br>- `expected_firmware_versions`: Optional[dict[str, str]] — Expected firmware versions keyed by amd-smi fw_id (e.g. PLDM_BUNDLE).<br>- `l0_to_recovery_count_error_threshold`: Optional[int] — L0-to-recovery count above which an error is raised.<br>- `l0_to_recovery_count_warning_threshold`: Optional[int] — L0-to-recovery count above which a warning is raised.<br>- `vendorid_ep`: Optional[str] — Expected endpoint vendor ID (e.g. for PCIe).<br>- `vendorid_ep_vf`: Optional[str] — Expected endpoint VF vendor ID.<br>- `devid_ep`: Optional[str] — Expected endpoint device ID.<br>- `devid_ep_vf`: Optional[str] — Expected endpoint VF device ID.<br>- `sku_name`: Optional[str] — Expected SKU name string for GPU.<br>- `expected_xgmi_speed`: Optional[list[float]] — Expected xGMI speed value(s) (e.g. link rate).<br>- `analysis_range_start`: Optional[datetime.datetime] — Start of time range for time-windowed analysis.<br>- `analysis_range_end`: Optional[datetime.datetime] — End of time range for time-windowed analysis. | **Collection Args:**<br>- `analysis_firmware_ids`: Optional[list[str]] — amd-smi fw_id values to record in analysis_ref.firmware_versions<br>- `cper_file_path`: Optional[str] — Path to CPER folder or file for RAS AFID collection (ras --afid --cper-file). | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) |
| BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'<br>wmic bios get SMBIOSBIOSVersion /Value | **Analyzer Args:**<br>- `exp_bios_version`: list[str] — Expected BIOS version(s) to match against collected value (str or list).<br>- `regex_match`: bool — If True, match exp_bios_version as regex; otherwise exact match. | - | [BiosDataModel](#BiosDataModel-Model) | [BiosCollector](#Collector-Class-BiosCollector) | [BiosAnalyzer](#Data-Analyzer-Class-BiosAnalyzer) |
| CmdlinePlugin | cat /proc/cmdline | **Analyzer Args:**<br>- `required_cmdline`: Union[str, List] — Command-line parameters that must be present (e.g. 'pci=bfsort').<br>- `banned_cmdline`: Union[str, List] — Command-line parameters that must not be present.<br>- `os_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-OS overrides for required_cmdline and banned_cmdline (keyed by OS identifier).<br>- `platform_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-platform overrides for required_cmdline and banned_cmdline (keyed by platform). | - | [CmdlineDataModel](#CmdlineDataModel-Model) | [CmdlineCollector](#Collector-Class-CmdlineCollector) | [CmdlineAnalyzer](#Data-Analyzer-Class-CmdlineAnalyzer) |
| DeviceEnumerationPlugin | powershell -Command "(Get-WmiObject -Class Win32_Processor &#124; Measure-Object).Count"<br>lspci -d {vendorid_ep}: &#124; grep -iE 'VGA&#124;Display&#124;3D&#124;Processing accelerators&#124;Co-processor&#124;Accelerator' &#124; grep -vi 'Virtual Function' &#124; wc -l<br>powershell -Command "(wmic path win32_VideoController get name &#124; findstr AMD &#124; Measure-Object).Count"<br>lscpu<br>lshw<br>lspci -d {vendorid_ep}: &#124; grep -i 'Virtual Function' &#124; wc -l<br>powershell -Command "(Get-VMHostPartitionableGpu &#124; Measure-Object).Count" | **Analyzer Args:**<br>- `cpu_count`: Optional[list[int]] — Expected CPU count(s); pass as int or list of ints. Analysis passes if actual is in list.<br>- `gpu_count`: Optional[list[int]] — Expected GPU count(s); pass as int or list of ints. Analysis passes if actual is in list.<br>- `vf_count`: Optional[list[int]] — Expected virtual function count(s); pass as int or list of ints. Analysis passes if actual is in list. | - | [DeviceEnumerationDataModel](#DeviceEnumerationDataModel-Model) | [DeviceEnumerationCollector](#Collector-Class-DeviceEnumerationCollector) | [DeviceEnumerationAnalyzer](#Data-Analyzer-Class-DeviceEnumerationAnalyzer) |
Expand Down Expand Up @@ -1755,6 +1755,7 @@ Check sysctl matches expected sysctl details
- **check_static_data**: `bool` — If True, run static data checks (e.g. driver version, partition mode).
- **expected_gpu_processes**: `Optional[int]` — Expected number of GPU processes.
- **expected_max_power**: `Optional[int]` — Expected maximum power value (e.g. watts).
- **expected_power_management**: `Optional[str]` — Expected amd-smi metric power_management value per GPU (e.g. DISABLED for active/full power, ENABLED for power-managed idle).
- **expected_driver_version**: `Optional[str]` — Expected AMD driver version string.
- **expected_memory_partition_mode**: `Optional[str]` — Expected memory partition mode (e.g. sp3, dp).
- **expected_compute_partition_mode**: `Optional[str]` — Expected compute partition mode.
Expand Down
29 changes: 14 additions & 15 deletions nodescraper/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from nodescraper.connection.redfish.redfish_params import RedfishConnectionParams
from nodescraper.constants import DEFAULT_LOGGER
from nodescraper.enums import ExecutionStatus, SystemInteractionLevel, SystemLocation
from nodescraper.models import PluginConfig, SystemInfo
from nodescraper.models import SystemInfo
from nodescraper.pluginexecutor import PluginExecutor
from nodescraper.pluginregistry import PluginRegistry

Expand All @@ -74,17 +74,16 @@ def _parse_plugin_configs_csv(value: str) -> list[str]:
return [p.strip() for p in value.split(",") if p.strip()]


def _config_registry_with_all_plugins(plugin_reg: PluginRegistry) -> ConfigRegistry:
"""Synthetic ``AllPlugins`` config used for CLI help and :func:`build_global_argument_parser`."""
config_reg = ConfigRegistry()
config_reg.configs["AllPlugins"] = PluginConfig(
name="AllPlugins",
desc="Run all registered plugins with default arguments",
global_args={},
plugins={name: {} for name in plugin_reg.plugins},
result_collators={},
)
return config_reg
def _default_config_registry(_plugin_reg: PluginRegistry) -> ConfigRegistry:
"""Build the config registry from bundled JSON and plugin-config entry points.

Args:
_plugin_reg (PluginRegistry): Unused; retained for call-site compatibility.

Returns:
ConfigRegistry: Registry containing bundled and entry-point plugin configs.
"""
return ConfigRegistry()


def _add_cli_root_globals(
Expand Down Expand Up @@ -203,7 +202,7 @@ def _add_cli_root_globals(
def build_global_argument_parser(*, add_help: bool = True) -> argparse.ArgumentParser:
"""Globals only (no subcommands), for host CLIs."""
plugin_reg = PluginRegistry()
config_reg = _config_registry_with_all_plugins(plugin_reg)
config_reg = _default_config_registry(plugin_reg)
parser = argparse.ArgumentParser(
description="node scraper CLI (global options only)",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
Expand Down Expand Up @@ -406,7 +405,7 @@ def get_cli_top_level_subcommands() -> tuple[str, ...]:
Tuple of ``subcmd`` subparser names; call ``cache_clear()`` if registries change in-process.
"""
plugin_reg = PluginRegistry()
config_reg = _config_registry_with_all_plugins(plugin_reg)
config_reg = _default_config_registry(plugin_reg)
parser, _plugin_subparser_map = build_parser(plugin_reg, config_reg)
return _top_level_subcommand_names(parser)

Expand Down Expand Up @@ -474,7 +473,7 @@ def main(
arg_input = sys.argv[1:]

plugin_reg = PluginRegistry()
config_reg = _config_registry_with_all_plugins(plugin_reg)
config_reg = _default_config_registry(plugin_reg)
parser, plugin_subparser_map = build_parser(plugin_reg, config_reg)

try:
Expand Down
4 changes: 2 additions & 2 deletions nodescraper/cli/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,8 @@ def parse_describe(
if not parsed_args.name:
out: list[str] = []
if parsed_args.type == "config":
out.append("Available built-in configs:")
for name in config_reg.configs:
out.append("Available configs:")
for name in sorted(config_reg.configs):
out.append(f" {name}")
elif parsed_args.type == "plugin":
out.append("Available plugins:")
Expand Down
127 changes: 122 additions & 5 deletions nodescraper/configregistry.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,30 +23,55 @@
# SOFTWARE.
#
###############################################################################
from __future__ import annotations

import importlib.metadata
import inspect
import json
import os
from pathlib import Path
from typing import Optional
from typing import Any, Optional

from pydantic import ValidationError

from nodescraper.models import PluginConfig

PLUGIN_CONFIG_ENTRY_POINT_GROUP = "nodescraper.plugin_configs"


class PluginConfigEntryPointError(RuntimeError):
"""Raised when a ``nodescraper.plugin_configs`` entry point cannot be loaded."""


class ConfigRegistry:
"""Class to load json plugin configs into models"""

INTERNAL_SEARCH_PATH = os.path.join(os.path.dirname(__file__), "configs")

def __init__(self, config_path: Optional[str] = None) -> None:
self.configs = {}
def __init__(
self,
config_path: Optional[str] = None,
load_entry_point_configs: bool = True,
) -> None:
"""Initialize the config registry.

Args:
config_path (Optional[str], optional): Path in which to search for JSON config files.
Defaults to None.
load_entry_point_configs (bool, optional): Whether to load ``nodescraper.plugin_configs``
entry points. Defaults to True.
"""
self.configs: dict[str, PluginConfig] = {}
self.load_configs(config_path)
if load_entry_point_configs:
self.configs.update(self.load_plugin_configs_from_entry_points())

def load_configs(self, config_path: Optional[str] = None):
"""load plugin config json files into pydantic models
"""Load plugin config JSON files into pydantic models.

Args:
config_path (Optional[str], optional): Path in which to search for config files. Defaults to None.
config_path (Optional[str], optional): Path in which to search for config files.
Defaults to None.
"""
if not config_path:
config_path = self.INTERNAL_SEARCH_PATH
Expand All @@ -64,3 +89,95 @@ def load_configs(self, config_path: Optional[str] = None):
self.configs[config_file.name] = config_model
except (ValidationError, json.JSONDecodeError):
pass

@staticmethod
def _entry_points_for_group(group: str):
"""Return setuptools entry points for the given group name.

Args:
group (str): Entry point group to query.

Returns:
Iterable: Entry points registered under ``group``.
"""
try:
return importlib.metadata.entry_points(group=group) # type: ignore[call-arg]
except TypeError:
all_eps = importlib.metadata.entry_points() # type: ignore[assignment]
return all_eps.get(group, []) # type: ignore[assignment, attr-defined, arg-type]

@staticmethod
def _resolve_entry_point_config(loaded: Any) -> PluginConfig | dict[str, Any] | None:
"""Resolve a loaded entry point object into a plugin config.

Args:
loaded (Any): Object returned by an entry point ``load()`` call.

Returns:
Optional[PluginConfig | dict[str, Any]]: Plugin config, or None if ``loaded`` is unsupported.
"""
if isinstance(loaded, PluginConfig):
return loaded
if isinstance(loaded, dict):
return loaded
if inspect.isclass(loaded) and hasattr(loaded, "plugin_config"):
config_data = loaded.plugin_config()
elif callable(loaded):
config_data = loaded()
else:
return None

if isinstance(config_data, (PluginConfig, dict)):
return config_data
return None

@classmethod
def load_plugin_configs_from_entry_points(cls) -> dict[str, PluginConfig]:
"""Load plugin configs registered under ``nodescraper.plugin_configs`` entry points.

Returns:
dict[str, PluginConfig]: Map of config name to loaded :class:`~nodescraper.models.PluginConfig`.

Raises:
PluginConfigEntryPointError: If an entry point target is missing, invalid, or unsupported.
"""
configs: dict[str, PluginConfig] = {}

for entry_point in cls._entry_points_for_group(PLUGIN_CONFIG_ENTRY_POINT_GROUP):
entry_point_name = getattr(entry_point, "name", None)
entry_point_label = entry_point_name or "<unknown>"
try:
loaded = entry_point.load() # type: ignore[attr-defined]
config_data = cls._resolve_entry_point_config(loaded)
if config_data is None:
raise PluginConfigEntryPointError(
f"Failed to load plugin config entry point {entry_point_label!r}: "
f"unsupported target {loaded!r}"
)

config_model = (
config_data
if isinstance(config_data, PluginConfig)
else PluginConfig(**config_data)
)
config_key = entry_point_name or config_model.name
if config_key:
configs[config_key] = config_model
except ModuleNotFoundError as exc:
raise PluginConfigEntryPointError(
f"Failed to load plugin config entry point {entry_point_label!r}: "
f"module not found ({exc}). Check the entry point target in pyproject.toml."
) from exc
except ValidationError as exc:
raise PluginConfigEntryPointError(
f"Failed to load plugin config entry point {entry_point_label!r}: "
"invalid plugin config"
) from exc
except PluginConfigEntryPointError:
raise
except Exception as exc:
raise PluginConfigEntryPointError(
f"Failed to load plugin config entry point {entry_point_label!r}: {exc}"
) from exc

return configs
19 changes: 0 additions & 19 deletions nodescraper/configs/node_status.json

This file was deleted.

32 changes: 31 additions & 1 deletion nodescraper/models/pluginconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@
# SOFTWARE.
#
###############################################################################
from typing import Optional
from __future__ import annotations

from typing import Any, Optional

from pydantic import BaseModel, Field

Expand All @@ -36,3 +38,31 @@ class PluginConfig(BaseModel):
result_collators: dict[str, dict] = Field(default_factory=dict)
name: Optional[str] = None
desc: Optional[str] = None

@classmethod
def coerce(cls, config: PluginConfig | dict[str, Any]) -> PluginConfig:
"""Return a ``PluginConfig`` instance from a model or mapping."""
if isinstance(config, cls):
return config
return cls.model_validate(config)

@classmethod
def merge(cls, *configs: PluginConfig | dict[str, Any]) -> PluginConfig:
"""Merge recipe plugin configs.

Plugin entries from later configs overwrite earlier ones with the same name.
``name``, ``desc``, ``global_args``, and ``result_collators`` come from the first
config.
"""
normalized = [cls.coerce(config) for config in configs]
merged_plugins: dict[str, dict[str, Any]] = {}
for config in normalized:
merged_plugins.update(config.plugins)
first = normalized[0] if normalized else cls()
return cls(
name=first.name,
desc=first.desc,
global_args=first.global_args,
plugins=merged_plugins,
result_collators=first.result_collators,
)
Loading
Loading