Skip to content

[24.04_linux-nvidia-6.17-next] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#442

Open
nirmoy wants to merge 1 commit into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:codex/pericom-msix-bar-war-6.17
Open

[24.04_linux-nvidia-6.17-next] PCI: mirror PI7C9X3G606GPC Port 4 BAR0#442
nirmoy wants to merge 1 commit into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:codex/pericom-msix-bar-war-6.17

Conversation

@nirmoy
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy commented May 27, 2026

Summary

  • Add a PCI final/early-resume quirk for Pericom/Diodes PI7C9X3G606GPC to mirror the upstream BAR0 value into downstream Port 4 BAR0.
  • Scope the WAR to the Diodes-confirmed OS-visible Tile0/P4 mapping: upstream bus + 1, device 04, function 0.
  • Port 4 BAR0 may read back as zero through normal PCI config space even after a successful write, so the quirk rewrites BAR0 whenever it runs.

Validation

  • Current PR head: 1670e403ccc6212fe92c2d678461da87897888b0.
  • Local patch checks passed at current head:
    • scripts/checkpatch.pl --strict --ignore GERRIT_CHANGE_ID --git HEAD
    • git diff --check HEAD~1..HEAD
    • make O=/tmp/nv-kernels-pr442-quirks-build -j$(nproc) drivers/pci/quirks.o
  • Previous 6.17 package validation booted 6.17.0-1019-nvidia-64k on the Quark DUT: OS 172.17.33.143 via jumper 10.22.18.250; BMC 172.17.33.144.
Linux localhost-right 6.17.0-1019-nvidia-64k #19 SMP PREEMPT_DYNAMIC Thu May 28 19:53:59 UTC 2026 aarch64
  • Quirk/topology evidence from the 6.17 booted kernel:
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]
pci 0002:a2:04.0: [12d8:c008] type 01 class 0x060400 PCIe Switch Downstream Port
pci 0002:a3:00.0: [1344:51c3] type 00 class 0x010802 PCIe Endpoint
pci 0002:a1:00.0: BAR 0 [mem 0x10c00000-0x10c7ffff]: assigned
pci 0002:a2:04.0: wrote upstream BAR 0 0x10c00000 to Port 4 BAR 0 for PI7C9X3G606GPC BAR0 mirror workaround
  • Retested the equivalent 6.18 PR code after removing the 64-bit upstream BAR0 skip. The Quark DUT booted 6.18.33-pr447-pericom-no64 and showed the quirk still firing:
pci 0002:a1:00.0: BAR 0 [mem 0x10300000-0x1037ffff]: assigned
pci 0002:a2:04.0: wrote upstream BAR 0 0x10300000 to Port 4 BAR 0 for PI7C9X3G606GPC BAR0 mirror workaround
  • Ran a 300s fio randrw smoke on the NVMe-backed rootfs with the no-64-skip test kernel:
pr447-no64-rootfs-smoke: err= 0
READ: bw=251MiB/s, io=73.6GiB, run=300009msec
WRITE: bw=108MiB/s, io=31.6GiB, run=300009msec
  • Post-fio journalctl -b -k scan for BTRFS error, I/O error, nvme.*timeout, device inaccessible, read-only, blk_update_request, and Buffer I/O error returned no matches.
  • Did not use the BMC/I2C BAR0 readback helper for this validation. The Quark platform owner said that helper uses special CPED/CDEP access that is not supported as routine validation on this platform and can put the PCIe switch into a bad state.

References

Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2154457

NVBug: https://nvbugspro.nvidia.com/bug/6205517
NVBug: https://nvbugspro.nvidia.com/bug/6134331

Test artifacts: http://baseos-internal-tools.nvidia.com:8003/

@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from 44e1553 to 31881cf Compare May 27, 2026 16:08
@nirmoy nirmoy added the help wanted Extra attention is needed label May 27, 2026
@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 27, 2026

Boro review

Summary

No issues found across the reviewed commits.

Findings: no problems found

Latest watcher review: open review

Kernel deb build: successful (download debs, 4 files)

Head: 1670e403ccc6

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ✅ All checks passed

Details
Checking 1 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1670e403ccc6 │ [SAUCE] pci: quirks: mirror pi7c9x3g606gpc port 4 bar0           │ N/A        │ N/A     │ nirmoyd                   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nvidia-bfigg
Copy link
Copy Markdown
Collaborator

Does this PR need to be applied to the 6.18 reference kernel as well?

@nvidia-bfigg
Copy link
Copy Markdown
Collaborator

Do you have tests (scripts) which can verify this patch set is applied and working?

@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 27, 2026

Hi @nvidia-bfigg

Does this PR need to be applied to the 6.18 reference kernel as well?

Yes, we need that for 6.18 too. I will create a PR for that as well

Do you have tests (scripts) which can verify this patch set is applied and working?
The kernel WAR will be always applied at the boot time because config read of the effected port always return 0 so we should see the in the dmesg wrote upstream BAR 0 %#x to Port 4 BAR 0 for PI7C

@nvidia-bfigg
Copy link
Copy Markdown
Collaborator

Hi @nvidia-bfigg

Does this PR need to be applied to the 6.18 reference kernel as well?

Yes, we need that for 6.18 too. I will create a PR for that as well

Do you have tests (scripts) which can verify this patch set is applied and working?
The kernel WAR will be always applied at the boot time because config read of the effected port always return 0 so we should see the in the dmesg wrote upstream BAR 0 %#x to Port 4 BAR 0 for PI7C

So if the patch is not applied to the kernel that message will not be in the dmesg. We should have a test that verifies that message is in the dmesg or the test should fail that the patch has not been applied, correct?

@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 28, 2026

Hi @nvidia-bfigg

Does this PR need to be applied to the 6.18 reference kernel as well?

Yes, we need that for 6.18 too. I will create a PR for that as well

Do you have tests (scripts) which can verify this patch set is applied and working?
The kernel WAR will be always applied at the boot time because config read of the effected port always return 0 so we should see the in the dmesg wrote upstream BAR 0 %#x to Port 4 BAR 0 for PI7C

So if the patch is not applied to the kernel that message will not be in the dmesg. We should have a test that verifies that message is in the dmesg or the test should fail that the patch has not been applied, correct?

Yes, ACK we should have a test may be a greenlit one to check the dmesg to verify the patch.

@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from 31881cf to 5edb468 Compare May 28, 2026 13:22
@nirmoy nirmoy marked this pull request as ready for review May 28, 2026 14:17
@nirmoy nirmoy marked this pull request as draft May 28, 2026 15:00
@nirmoy nirmoy removed help wanted Extra attention is needed pending_review_comment labels May 28, 2026
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from 5edb468 to e562a4a Compare May 28, 2026 19:33
@nirmoy nirmoy marked this pull request as ready for review May 28, 2026 21:01
@nirmoy nirmoy added help wanted Extra attention is needed pending_review_comment labels May 28, 2026
@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 28, 2026

Codex gave me this suggestion but not sure in this environment:
drivers/pci/quirks.c:6318 (

DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_PERICOM,
PCI_DEVICE_ID_PERICOM_PI7C9X3G606GPC,
pci_fixup_pericom_pi7c9x3g606gpc_bar0_mirror);
) registers this as a normal RESUME fixup. For a bridge coming back from D3cold, PCI core restores config space, runs
pci_fixup_resume_early, and may resume the subordinate bus in pci_pm_resume_noirq() before normal resume fixups run: see pci-driver.c:963-L970 (
if (!(skip_bus_pm && pm_suspend_no_platform()))
pci_pm_default_resume_early(pci_dev);
pci_fixup_device(pci_fixup_resume_early, pci_dev);
pcie_pme_root_status_cleanup(pci_dev);
if (!skip_bus_pm && prev_state == PCI_D3cold)
pci_pm_bridge_power_up_actions(pci_dev);
) and
pci-driver.c:581-L586 (
/*
* When powering on a bridge from D3cold, the whole hierarchy may be
* powered on into D0uninitialized state, resume them to give them a
* chance to suspend again
*/
pci_resume_bus(pci_dev->subordinate);
). If this BAR mirror is required for devices below Port 4 after the switch loses state, children can resume while Port 4 BAR0 is
still stale/zero. I’d move this to DECLARE_PCI_FIXUP_RESUME_EARLY or add an early resume fixup so the mirror is rewritten immediately after pci_restore_state() and before subordinate devices resume.

@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from e562a4a to 377a0dc Compare May 29, 2026 13:23
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 29, 2026

@nirmoy Can you confirm that this is limited to 32-bit bars? Is that because 64-bit bars are not possible in this configuration?

@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 29, 2026

Codex gave me this suggestion but not sure in this environment: drivers/pci/quirks.c:6318 (

DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_PERICOM,
PCI_DEVICE_ID_PERICOM_PI7C9X3G606GPC,
pci_fixup_pericom_pi7c9x3g606gpc_bar0_mirror);

) registers this as a normal RESUME fixup. For a bridge coming back from D3cold, PCI core restores config space, runs
pci_fixup_resume_early, and may resume the subordinate bus in pci_pm_resume_noirq() before normal resume fixups run: see pci-driver.c:963-L970 (

if (!(skip_bus_pm && pm_suspend_no_platform()))
pci_pm_default_resume_early(pci_dev);
pci_fixup_device(pci_fixup_resume_early, pci_dev);
pcie_pme_root_status_cleanup(pci_dev);
if (!skip_bus_pm && prev_state == PCI_D3cold)
pci_pm_bridge_power_up_actions(pci_dev);

) and
pci-driver.c:581-L586 (

/*
* When powering on a bridge from D3cold, the whole hierarchy may be
* powered on into D0uninitialized state, resume them to give them a
* chance to suspend again
*/
pci_resume_bus(pci_dev->subordinate);

). If this BAR mirror is required for devices below Port 4 after the switch loses state, children can resume while Port 4 BAR0 is
still stale/zero. I’d move this to DECLARE_PCI_FIXUP_RESUME_EARLY or add an early resume fixup so the mirror is rewritten immediately after pci_restore_state() and before subordinate devices resume.

It make sense to have this. I will update it

@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 29, 2026

@nirmoy Can you confirm that this is limited to 32-bit bars? Is that because 64-bit bars are not possible in this configuration?

The erratum describes the WAR as copying only BAR0 at offset 0x10. That is sufficient for a 32-bit BAR, but would not be complete for a 64-bit BAR because BAR1 contains the upper address bits. So I am defensively skipping the WAR for 64-bit BARs.

The device can be configured with a 64-bit BAR0/BAR1 pair, but on our platform it is configured as a 32-bit BAR. I confirmed that from the BMC readback of the upstream BAR0 value, where the BAR type bits indicate a 32-bit memory BAR.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 29, 2026

@nirmoy Can you confirm that this is limited to 32-bit bars? Is that because 64-bit bars are not possible in this configuration?

The erratum describes the WAR as copying only BAR0 at offset 0x10. That is sufficient for a 32-bit BAR, but would not be complete for a 64-bit BAR because BAR1 contains the upper address bits. So I am defensively skipping the WAR for 64-bit BARs.

The device can be configured with a 64-bit BAR0/BAR1 pair, but on our platform it is configured as a 32-bit BAR. I confirmed that from the BMC readback of the upstream BAR0 value, where the BAR type bits indicate a 32-bit memory BAR.

Thanks for clarifying. If we end up sending this upstream we may need to relax the checking a bit.

@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from 377a0dc to c2c4176 Compare May 29, 2026 19:24
Some Pericom/Diodes PI7C9X3G606GPC switches require downstream
Port 4 BAR0 to mirror BAR0 of the immediate upstream port. Firmware may
apply this during boot, but Linux PCI resource assignment can move the
upstream BAR0 and leave Port 4 without the required mirror.

Diodes confirmed that Tile0/P4 is OS-visible as device 04, function 0 on
the bus below the upstream port. Add a final and early resume quirk for
that downstream function. The quirk verifies that the immediate upstream
bridge is the same switch, then writes Port 4 BAR0 from the upstream
BAR0 after resource assignment and during early resume. Port 4 BAR0 may
read back as zero even after a successful write, so the write must be
validated by platform-specific means.

Change-Id: I86222773f9bb321ea3e24df7fb1b4a3f84a008a4
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/pericom-msix-bar-war-6.17 branch from c2c4176 to 1670e40 Compare May 29, 2026 19:27
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 29, 2026

I looked at the latest version and agree with change to resume early.

I see that the 64-bit bar check was removed. We can do that, but then I think the code needs to properly handle 32 and 64-bit bars being fixed up - I should have been clearer when I said "relax the checking" in my prior comment.

Here's what Codex has to say:

  • drivers/pci/quirks.c:6291: the updated patch removed the 64-bit BAR guard, but the code still mirrors only PCI_BASE_ADDRESS_0. If upstream BAR0 is ever a 64-bit memory BAR, BAR1 contains the high dword, so this is not a complete mirror. The current unassigned check can also misclassify a valid 64-bit BAR whose low address dword is zero. If Diodes guarantees this switch exposes that BAR only as a 32-bit memory BAR, document that and/or restore the guard; otherwise mirror BAR1 too for the 64-bit case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed pending_review_comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants