fix: mark xfail test_groundedness_e2e_string_documents#1205
fix: mark xfail test_groundedness_e2e_string_documents#1205akihikokuroda wants to merge 2 commits into
Conversation
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
planetf1
left a comment
There was a problem hiding this comment.
One suggestion I can't anchor to the diff: the sibling test test_groundedness_e2e_complex_response just above uses assert isinstance(result.as_bool(), bool) precisely because grounding verdicts are hardware-dependent. If this test is primarily checking that string documents are accepted as input rather than pinning a specific verdict, the same looser assertion would make it deterministic without needing xfail at all.
| @pytest.mark.e2e | ||
| @pytest.mark.huggingface | ||
| @require_gpu(min_vram_gb=8) | ||
| @pytest.mark.xfail(reason="CPU/GPU response differences") |
There was a problem hiding this comment.
Wondering if this swap is intentional — @require_gpu(min_vram_gb=8) and @pytest.mark.xfail do quite different things. require_gpu is a skip gate; on a CPU-only runner it returns skipif(True, ...) so the test never executes, which is what all eight sibling tests in this file do. xfail is a result modifier — the test still runs, it just tolerates a failure. After this change this becomes the only test in the file that will load ibm-granite/granite-4.0-micro and run inference on a CPU-only runner.
Suggest keeping both. Worth filing an issue for the CPU/GPU divergence and linking it in the reason too — bare xfails without an issue reference tend to get missed in test-health sweeps:
| @pytest.mark.xfail(reason="CPU/GPU response differences") | |
| @require_gpu(min_vram_gb=8) | |
| @pytest.mark.xfail(strict=False, reason="CPU/GPU citation response differences — see #NNNN") |
There was a problem hiding this comment.
This requirement processes the citation response in detail. It can not loosen the condition. I restored require_gpu as suggested.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Pull Request
Issue
Description
Work around test failure. Mark xfail a test failing caused by citation response differences between CPU and GPU.
Testing
Attribution
Adding a new component, requirement, sampling strategy, or tool?
If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.
NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.