Conversation
There was a problem hiding this comment.
Pull request overview
This PR enables additional shim tests for AIE4/NPU3 devices, bringing the total to 42 passing tests. The changes add support for a new kernel type (KERNEL_TYPE_TXN_FULL_ELF_PREEMPT_AIE4), implement NPU3-specific device filtering and configuration, and handle NPU3's lack of telemetry support for preemption verification.
Changes:
- Added NPU3 device filter and virtual context configuration (128 contexts for NPU3)
- Introduced new AIE4 preempt test infrastructure with
elf_preempt_aie4_io_test_bo_setclass - Updated test parameter handling to support device-specific xclbin selection
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/shim_test/shim_test.cpp | Added NPU3 device filter, updated virtual context limits, modified test parameters for multi-context tests, and added new AIE4 preemption test |
| test/shim_test/io_test.cpp | Added AIE4 kernel type support, implemented telemetry availability check for NPU3, and added conditional xclbin selection |
| test/shim_test/io.h | Declared new elf_preempt_aie4_io_test_bo_set class for AIE4 preemption testing |
| test/shim_test/io.cpp | Implemented AIE4 preemption test class with weight offset handling and result verification |
| test/shim_test/hwctx.h | Updated full ELF check to include new AIE4 preempt kernel type |
| test/shim_test/exec_buf.h | Added method for patched 64-bit arguments and updated copyright year |
| test/shim_test/dev_info.h | Added new kernel type enum value for AIE4 full ELF preemption |
| test/shim_test/dev_info.cpp | Added xclbin configuration entries for resnet50.elf on NPU3 devices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yes just went 10/10 on silicon, all 42 tests passed each time |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // XDNA driver by default supports 6 virtual context on npu1, 128 on npu3, and 32 virtual context on npu4 | ||
| if (device_id == npu1_device_id) | ||
| num_virt_ctx = 6; | ||
| else if (device_id == npu3_device_id) |
There was a problem hiding this comment.
The condition only checks for npu3_device_id but should also check for npu3_device_id1 to match the new dev_filter_is_npu3 function behavior that handles both device IDs.
| else if (device_id == npu3_device_id) | |
| else if (device_id == npu3_device_id || device_id == npu3_device_id1) |
There was a problem hiding this comment.
can you test if you can get some number? I think VF can.
There was a problem hiding this comment.
Sure, I can test that!
There was a problem hiding this comment.
Added implementation based on Max's suggestion, works well
| goto free_buf; | ||
| } | ||
|
|
||
| print_hex_dump_debug("telemetry: ", DUMP_PREFIX_OFFSET, 16, 4, buff, |
There was a problem hiding this comment.
removing this will make dmesg disabled for debugging. can we have this line unchanged or it is flooding?
There was a problem hiding this comment.
It is flooding, since telemetry info is large it completely fills the dmesg. We can figure something else out in between, maybe just print first couple of lines?
|
@xdavidz addressed several of your comments and made some enhancements, let me know what you think. All tests passing on classic, 1 preemption test fails on VF due to issues we spoke about. |
|
retest this please |
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { "DPU:dpu", {0xffffffff} }, | ||
| }, | ||
| .workspace = "local_shim_test_data/elf_bad_ctrl_npu3", | ||
| .workspace = "local_shim_test_data/npu3a/bad_ctrl", |
There was a problem hiding this comment.
workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).
| .ip_name2idx = { | ||
| { "DPU:dpu", {0xffffffff} }, | ||
| }, | ||
| .workspace = "local_shim_test_data/npu3a/resnet50", |
There was a problem hiding this comment.
workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).
| { "DPU:dpu", {0xffffffff} }, | ||
| }, | ||
| .workspace = "local_shim_test_data/npu3a/nop/", | ||
| .workspace = "local_shim_test_data/npu3a/nop", |
There was a problem hiding this comment.
workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).
| struct aie4_fw_telemetry { | ||
| uint8_t enabled; | ||
| uint8_t reserved[3]; | ||
| struct aie4_clk_deep_slp deep_slp; | ||
| uint64_t l1_interrupt; | ||
| uint64_t context_starting[AIE4_MAX_NUM_SUPERVISORS + 1]; | ||
| uint64_t scheduler_scheduled[AIE4_MAX_NUM_SUPERVISORS + 1]; | ||
| uint64_t did_dma; | ||
| uint64_t resource_acquired[AIE4_MAX_NUM_SUPERVISORS]; | ||
| struct aie4_telemetry_opcodes opcodes; | ||
| uint64_t preemption_frame_boundary_counter[AIE4_TOTAL_NUM_UC]; | ||
| uint64_t preemption_checkpoint_event_counter[AIE4_TOTAL_NUM_UC]; | ||
| }; |
There was a problem hiding this comment.
These firmware-facing telemetry structs are parsed via reinterpret_cast<aie4_fw_telemetry*> from a raw byte buffer, but the structs are not explicitly packed/aligned to the firmware ABI. Default C++ layout/padding can differ from the firmware layout and will yield incorrect field decoding. Make the structs explicitly packed (and ideally add static_assert checks for expected offsets/sizes), or parse using explicit offsets with memcpy.
| virtual xrt_core::query::rtos_telemetry::result_type | ||
| query_rtos_telemetry(const shim_xdna::pdev& pci_dev) const { | ||
| xrt_core::query::rtos_telemetry::result_type output; |
There was a problem hiding this comment.
The previous implementation gated RTOS telemetry to NPU4 only; that guard has been removed in the base (AIE2) handler. This will now attempt RTOS telemetry queries for any non-AIE4 device using the legacy amdxdna_drm_query_telemetry layout, which can return garbage or fail on devices/firmware that don’t implement those fields. Reintroduce appropriate device/feature gating (either in get() before dispatching, or inside the base query_rtos_telemetry() implementation).
| auto cmd_hdl = std::get<0>(cmd).get()->get(); | ||
| auto cmd_pkt = std::get<1>(cmd); | ||
|
|
||
| if (cmd_pkt->opcode == ERT_CMD_CHAIN) { | ||
| size_t start_idx = cmd_idx * cmds_per_list; | ||
| size_t end_idx = std::min(start_idx + cmds_per_list, bo_set.size()); | ||
| for (size_t i = start_idx; i < end_idx; i++) { | ||
| auto subcmd_bo = bo_set[i]->get_bos()[IO_TEST_BO_CMD].tbo.get(); | ||
| auto subcmd_pkt = reinterpret_cast<ert_start_kernel_cmd *>(subcmd_bo->map()); | ||
| bo_set[i]->restore_cmd_header(subcmd_bo->get(), subcmd_pkt); | ||
| } | ||
| } else { | ||
| bo_set[cmd_idx]->restore_cmd_header(cmd_hdl, cmd_pkt); | ||
| bo_set[cmd_idx]->cache_cmd_header(cmd_hdl, cmd_pkt); | ||
| } | ||
| restore_cmd_headers_before_submit(bo_set, cmd_idx, cmds_per_list, cmd_hdl, cmd_pkt); | ||
|
|
||
| hwq->submit_command(cmd_hdl); |
There was a problem hiding this comment.
The refactor removed cache_cmd_header() calls that previously occurred for non-chain command packets (and also removed initial caching in other paths later in the file). If any io_test_bo_set_base relies on caching the original header before patching/mutation, restore_cmd_header() may restore from an uninitialized/empty cache, causing malformed submits or flaky behavior across iterations. Ensure command headers are cached at least once per BO-set (including non-chain and chain sub-commands) before the first restore/submit loop, or restore the removed caching in the appropriate setup location.
| void | ||
| elf_preempt_aie4_io_test_bo_set:: | ||
| init_cmd(hw_ctx& hwctx, bool dump) | ||
| { | ||
| std::vector<uint32_t> wts_offsets; | ||
| uint32_t offset; | ||
|
|
||
| exec_buf ebuf(*m_bo_array[IO_TEST_BO_CMD].tbo.get(), ERT_START_DPU); |
There was a problem hiding this comment.
elf_preempt_io_test_bo_set::init_cmd() now caches the command header after building the packet, but the new AIE4 variant does not. With the caching removed/refactored elsewhere, this AIE4 path is especially likely to submit packets that can’t be reliably restored between iterations. Add equivalent header caching for IO_TEST_BO_CMD after the packet is constructed in this init_cmd().
|
Closing to create several smaller PRs with these changes |
42 passing shim tests with these changes