fix: mark xfail test_groundedness_e2e_string_documents by akihikokuroda · Pull Request #1205 · generative-computing/mellea

akihikokuroda · 2026-06-04T15:29:52Z

Pull Request

Issue

Description

Work around test failure. Mark xfail a test failing caused by citation response differences between CPU and GPU.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code was added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

Component
Requirement
Sampling Strategy
Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1

One suggestion I can't anchor to the diff: the sibling test test_groundedness_e2e_complex_response just above uses assert isinstance(result.as_bool(), bool) precisely because grounding verdicts are hardware-dependent. If this test is primarily checking that string documents are accepted as input rather than pinning a specific verdict, the same looser assertion would make it deterministic without needing xfail at all.

planetf1 · 2026-06-04T18:27:51Z

 @pytest.mark.e2e
 @pytest.mark.huggingface
-@require_gpu(min_vram_gb=8)
+@pytest.mark.xfail(reason="CPU/GPU response differences")


Wondering if this swap is intentional — @require_gpu(min_vram_gb=8) and @pytest.mark.xfail do quite different things. require_gpu is a skip gate; on a CPU-only runner it returns skipif(True, ...) so the test never executes, which is what all eight sibling tests in this file do. xfail is a result modifier — the test still runs, it just tolerates a failure. After this change this becomes the only test in the file that will load ibm-granite/granite-4.0-micro and run inference on a CPU-only runner.

Suggest keeping both. Worth filing an issue for the CPU/GPU divergence and linking it in the reason too — bare xfails without an issue reference tend to get missed in test-health sweeps:

Suggested change

@pytest.mark.xfail(reason="CPU/GPU response differences")

@require_gpu(min_vram_gb=8)

@pytest.mark.xfail(strict=False, reason="CPU/GPU citation response differences — see #NNNN")

This requirement processes the citation response in detail. It can not loosen the condition. I restored require_gpu as suggested.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

mark xfail test_groundedness_e2e_string_documents

c75b371

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from a team as a code owner June 4, 2026 15:29

akihikokuroda requested review from AngeloDanducci, jakelorocco and planetf1 June 4, 2026 15:29

github-actions Bot added the bug Something isn't working label Jun 4, 2026

planetf1 requested changes Jun 4, 2026

View reviewed changes

planetf1 reviewed Jun 4, 2026

View reviewed changes

review comment

29800e4

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from planetf1 June 4, 2026 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mark xfail test_groundedness_e2e_string_documents#1205

fix: mark xfail test_groundedness_e2e_string_documents#1205
akihikokuroda wants to merge 2 commits into
generative-computing:mainfrom
akihikokuroda:xfail_groundedness

akihikokuroda commented Jun 4, 2026 •

edited

Loading

Uh oh!

planetf1 left a comment

Uh oh!

planetf1 Jun 4, 2026

Uh oh!

akihikokuroda Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	@pytest.mark.xfail(reason="CPU/GPU response differences")
	@require_gpu(min_vram_gb=8)
	@pytest.mark.xfail(strict=False, reason="CPU/GPU citation response differences — see #NNNN")

Conversation

akihikokuroda commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Issue

Description

Testing

Attribution

Adding a new component, requirement, sampling strategy, or tool?

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

akihikokuroda Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akihikokuroda commented Jun 4, 2026 •

edited

Loading