Skip to content

Even more aie4 shim test enablement#1025

Closed
hlaccabu wants to merge 3 commits intoamd:mainfrom
hlaccabu:shim
Closed

Even more aie4 shim test enablement#1025
hlaccabu wants to merge 3 commits intoamd:mainfrom
hlaccabu:shim

Conversation

@hlaccabu
Copy link
Copy Markdown
Contributor

42 passing shim tests with these changes

@hlaccabu hlaccabu requested review from Copilot and xdavidz January 23, 2026 01:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables additional shim tests for AIE4/NPU3 devices, bringing the total to 42 passing tests. The changes add support for a new kernel type (KERNEL_TYPE_TXN_FULL_ELF_PREEMPT_AIE4), implement NPU3-specific device filtering and configuration, and handle NPU3's lack of telemetry support for preemption verification.

Changes:

  • Added NPU3 device filter and virtual context configuration (128 contexts for NPU3)
  • Introduced new AIE4 preempt test infrastructure with elf_preempt_aie4_io_test_bo_set class
  • Updated test parameter handling to support device-specific xclbin selection

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/shim_test/shim_test.cpp Added NPU3 device filter, updated virtual context limits, modified test parameters for multi-context tests, and added new AIE4 preemption test
test/shim_test/io_test.cpp Added AIE4 kernel type support, implemented telemetry availability check for NPU3, and added conditional xclbin selection
test/shim_test/io.h Declared new elf_preempt_aie4_io_test_bo_set class for AIE4 preemption testing
test/shim_test/io.cpp Implemented AIE4 preemption test class with weight offset handling and result verification
test/shim_test/hwctx.h Updated full ELF check to include new AIE4 preempt kernel type
test/shim_test/exec_buf.h Added method for patched 64-bit arguments and updated copyright year
test/shim_test/dev_info.h Added new kernel type enum value for AIE4 full ELF preemption
test/shim_test/dev_info.cpp Added xclbin configuration entries for resnet50.elf on NPU3 devices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/shim_test/io.cpp
xdavidz
xdavidz previously approved these changes Jan 26, 2026
Copy link
Copy Markdown
Contributor

@xdavidz xdavidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlaccabu have you verified this on board 10+ times? make sure it is stable enough please.

@hlaccabu
Copy link
Copy Markdown
Contributor Author

@hlaccabu have you verified this on board 10+ times? make sure it is stable enough please.

yes just went 10/10 on silicon, all 42 tests passed each time

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/shim_test/shim_test.cpp Outdated
// XDNA driver by default supports 6 virtual context on npu1, 128 on npu3, and 32 virtual context on npu4
if (device_id == npu1_device_id)
num_virt_ctx = 6;
else if (device_id == npu3_device_id)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition only checks for npu3_device_id but should also check for npu3_device_id1 to match the new dev_filter_is_npu3 function behavior that handles both device IDs.

Suggested change
else if (device_id == npu3_device_id)
else if (device_id == npu3_device_id || device_id == npu3_device_id1)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xdavidz Can VF do same amount of ctxs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test if you can get some number? I think VF can.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can test that!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added implementation based on Max's suggestion, works well

Comment thread src/driver/amdxdna/aie4_message.c
@hlaccabu hlaccabu requested a review from xdavidz January 28, 2026 23:42
Comment thread src/driver/amdxdna/aie4_message.c
Comment thread src/driver/amdxdna/aie4_pci.c
goto free_buf;
}

print_hex_dump_debug("telemetry: ", DUMP_PREFIX_OFFSET, 16, 4, buff,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this will make dmesg disabled for debugging. can we have this line unchanged or it is flooding?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is flooding, since telemetry info is large it completely fills the dmesg. We can figure something else out in between, maybe just print first couple of lines?

Comment thread src/shim/device.cpp Outdated
Comment thread src/shim/device.cpp Outdated
@hlaccabu
Copy link
Copy Markdown
Contributor Author

@xdavidz addressed several of your comments and made some enhancements, let me know what you think. All tests passing on classic, 1 preemption test fails on VF due to issues we spoke about.

Copilot AI review requested due to automatic review settings January 30, 2026 17:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@maxzhen
Copy link
Copy Markdown
Collaborator

maxzhen commented Feb 6, 2026

retest this please

root and others added 2 commits February 6, 2026 15:13
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Signed-off-by: Hayden Laccabue <hlaccabu@amd.com>
Copilot AI review requested due to automatic review settings February 8, 2026 04:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{ "DPU:dpu", {0xffffffff} },
},
.workspace = "local_shim_test_data/elf_bad_ctrl_npu3",
.workspace = "local_shim_test_data/npu3a/bad_ctrl",
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).

Copilot uses AI. Check for mistakes.
.ip_name2idx = {
{ "DPU:dpu", {0xffffffff} },
},
.workspace = "local_shim_test_data/npu3a/resnet50",
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).

Copilot uses AI. Check for mistakes.
{ "DPU:dpu", {0xffffffff} },
},
.workspace = "local_shim_test_data/npu3a/nop/",
.workspace = "local_shim_test_data/npu3a/nop",
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace paths previously had a trailing /, and get_xclbin_path() concatenates wrk + xclbin_name directly. Removing the trailing slash will produce invalid paths (e.g., .../vaddfoo.xclbin). Restore the trailing / here, or change get_xclbin_workspace/get_xclbin_path to join paths safely (ensuring exactly one separator).

Copilot uses AI. Check for mistakes.
Comment thread src/shim/device.cpp
Comment on lines +999 to +1011
struct aie4_fw_telemetry {
uint8_t enabled;
uint8_t reserved[3];
struct aie4_clk_deep_slp deep_slp;
uint64_t l1_interrupt;
uint64_t context_starting[AIE4_MAX_NUM_SUPERVISORS + 1];
uint64_t scheduler_scheduled[AIE4_MAX_NUM_SUPERVISORS + 1];
uint64_t did_dma;
uint64_t resource_acquired[AIE4_MAX_NUM_SUPERVISORS];
struct aie4_telemetry_opcodes opcodes;
uint64_t preemption_frame_boundary_counter[AIE4_TOTAL_NUM_UC];
uint64_t preemption_checkpoint_event_counter[AIE4_TOTAL_NUM_UC];
};
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These firmware-facing telemetry structs are parsed via reinterpret_cast<aie4_fw_telemetry*> from a raw byte buffer, but the structs are not explicitly packed/aligned to the firmware ABI. Default C++ layout/padding can differ from the firmware layout and will yield incorrect field decoding. Make the structs explicitly packed (and ideally add static_assert checks for expected offsets/sizes), or parse using explicit offsets with memcpy.

Copilot uses AI. Check for mistakes.
Comment thread src/shim/device.cpp
Comment on lines +1100 to +1102
virtual xrt_core::query::rtos_telemetry::result_type
query_rtos_telemetry(const shim_xdna::pdev& pci_dev) const {
xrt_core::query::rtos_telemetry::result_type output;
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation gated RTOS telemetry to NPU4 only; that guard has been removed in the base (AIE2) handler. This will now attempt RTOS telemetry queries for any non-AIE4 device using the legacy amdxdna_drm_query_telemetry layout, which can return garbage or fail on devices/firmware that don’t implement those fields. Reintroduce appropriate device/feature gating (either in get() before dispatching, or inside the base query_rtos_telemetry() implementation).

Copilot uses AI. Check for mistakes.
Comment on lines 151 to 156
auto cmd_hdl = std::get<0>(cmd).get()->get();
auto cmd_pkt = std::get<1>(cmd);

if (cmd_pkt->opcode == ERT_CMD_CHAIN) {
size_t start_idx = cmd_idx * cmds_per_list;
size_t end_idx = std::min(start_idx + cmds_per_list, bo_set.size());
for (size_t i = start_idx; i < end_idx; i++) {
auto subcmd_bo = bo_set[i]->get_bos()[IO_TEST_BO_CMD].tbo.get();
auto subcmd_pkt = reinterpret_cast<ert_start_kernel_cmd *>(subcmd_bo->map());
bo_set[i]->restore_cmd_header(subcmd_bo->get(), subcmd_pkt);
}
} else {
bo_set[cmd_idx]->restore_cmd_header(cmd_hdl, cmd_pkt);
bo_set[cmd_idx]->cache_cmd_header(cmd_hdl, cmd_pkt);
}
restore_cmd_headers_before_submit(bo_set, cmd_idx, cmds_per_list, cmd_hdl, cmd_pkt);

hwq->submit_command(cmd_hdl);
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor removed cache_cmd_header() calls that previously occurred for non-chain command packets (and also removed initial caching in other paths later in the file). If any io_test_bo_set_base relies on caching the original header before patching/mutation, restore_cmd_header() may restore from an uninitialized/empty cache, causing malformed submits or flaky behavior across iterations. Ensure command headers are cached at least once per BO-set (including non-chain and chain sub-commands) before the first restore/submit loop, or restore the removed caching in the appropriate setup location.

Copilot uses AI. Check for mistakes.
Comment thread test/shim_test/io.cpp
Comment on lines +781 to +788
void
elf_preempt_aie4_io_test_bo_set::
init_cmd(hw_ctx& hwctx, bool dump)
{
std::vector<uint32_t> wts_offsets;
uint32_t offset;

exec_buf ebuf(*m_bo_array[IO_TEST_BO_CMD].tbo.get(), ERT_START_DPU);
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elf_preempt_io_test_bo_set::init_cmd() now caches the command header after building the packet, but the new AIE4 variant does not. With the caching removed/refactored elsewhere, this AIE4 path is especially likely to submit packets that can’t be reliably restored between iterations. Add equivalent header caching for IO_TEST_BO_CMD after the packet is constructed in this init_cmd().

Copilot uses AI. Check for mistakes.
@amd amd deleted a comment from Copilot AI Feb 8, 2026
@amd amd deleted a comment from Copilot AI Feb 8, 2026
@amd amd deleted a comment from Copilot AI Feb 8, 2026
@amd amd deleted a comment from Copilot AI Feb 8, 2026
@amd amd deleted a comment from Copilot AI Feb 8, 2026
@amd amd deleted a comment from Copilot AI Feb 9, 2026
@hlaccabu hlaccabu closed this Feb 9, 2026
@hlaccabu
Copy link
Copy Markdown
Contributor Author

hlaccabu commented Feb 9, 2026

Closing to create several smaller PRs with these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants