Skip to content

fix: mark xfail test_groundedness_e2e_string_documents#1205

Open
akihikokuroda wants to merge 2 commits into
generative-computing:mainfrom
akihikokuroda:xfail_groundedness
Open

fix: mark xfail test_groundedness_e2e_string_documents#1205
akihikokuroda wants to merge 2 commits into
generative-computing:mainfrom
akihikokuroda:xfail_groundedness

Conversation

@akihikokuroda
Copy link
Copy Markdown
Member

@akihikokuroda akihikokuroda commented Jun 4, 2026

Pull Request

Issue

Description

Work around test failure. Mark xfail a test failing caused by citation response differences between CPU and GPU.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from a team as a code owner June 4, 2026 15:29
@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion I can't anchor to the diff: the sibling test test_groundedness_e2e_complex_response just above uses assert isinstance(result.as_bool(), bool) precisely because grounding verdicts are hardware-dependent. If this test is primarily checking that string documents are accepted as input rather than pinning a specific verdict, the same looser assertion would make it deterministic without needing xfail at all.

@pytest.mark.e2e
@pytest.mark.huggingface
@require_gpu(min_vram_gb=8)
@pytest.mark.xfail(reason="CPU/GPU response differences")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this swap is intentional — @require_gpu(min_vram_gb=8) and @pytest.mark.xfail do quite different things. require_gpu is a skip gate; on a CPU-only runner it returns skipif(True, ...) so the test never executes, which is what all eight sibling tests in this file do. xfail is a result modifier — the test still runs, it just tolerates a failure. After this change this becomes the only test in the file that will load ibm-granite/granite-4.0-micro and run inference on a CPU-only runner.

Suggest keeping both. Worth filing an issue for the CPU/GPU divergence and linking it in the reason too — bare xfails without an issue reference tend to get missed in test-health sweeps:

Suggested change
@pytest.mark.xfail(reason="CPU/GPU response differences")
@require_gpu(min_vram_gb=8)
@pytest.mark.xfail(strict=False, reason="CPU/GPU citation response differences — see #NNNN")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement processes the citation response in detail. It can not loosen the condition. I restored require_gpu as suggested.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from planetf1 June 4, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants