Skip to content

fix(tests): resolve flaky acceptance tests (app creation, human eval, prompt registry, members invite)#4458

Open
bekossy wants to merge 2 commits into
release/v0.100.4from
fix/flaky-acceptance-tests
Open

fix(tests): resolve flaky acceptance tests (app creation, human eval, prompt registry, members invite)#4458
bekossy wants to merge 2 commits into
release/v0.100.4from
fix/flaky-acceptance-tests

Conversation

@bekossy
Copy link
Copy Markdown
Member

@bekossy bekossy commented May 27, 2026

Summary

  • app/test.ts: Use force:true on Popover item click to bypass re-render instability from appTemplatesQueryAtom resolving post-mount, which briefly detaches the dropdown item from the DOM
  • human-annotation/tests.ts: Cache the found button inside expect.poll to eliminate a TOCTOU race where the button disappears between the poll succeeding and the second lookup
  • prompt-registry/index.ts: Poll for any visible published revision row instead of a specific ID that may be scrolled out of the viewport in the virtual-scroll table
  • members/index.ts: Replace waitForLoadState("networkidle") (stalls indefinitely on pages with polling/WebSockets) with a targeted wait for the email input; increase timeout to 90 s for tests that run a full invite flow as setup; wait for the dynamic-imported InviteUsersModal form body (not just the dialog wrapper) before filling

Pre-existing staged changes also included:

  • deployment/index.ts: Skip skeleton rows in row selector, extend row timeout to 30 s
  • observability/index.ts: Increase test timeouts to 300 s, improve polling loop
  • playground/index.ts: Wait for Compare button to be enabled before clicking
  • apiHelpers/index.ts: Exclude evaluator apps from completion app lookup

Test plan

  • Run members.spec.ts — "should resend an invitation and confirm success" should pass consistently
  • Run app/index.ts — "creates new chat prompt app" should pass consistently
  • Run human-annotation/index.ts — WEB-ACC-HUMAN-001 should pass consistently
  • Run prompt-registry/index.ts — "should open prompt details from prompt registry" should pass consistently

🤖 Generated with Claude Code

…egistry, and members

- app/test.ts: use force:true on popover item click — the CreateAppDropdown
  re-renders when appTemplatesQueryAtom resolves, making the button briefly
  unstable; force bypasses Playwright's stability check that was retrying to
  the 60 s test timeout
- human-annotation/tests.ts: fix TOCTOU race in getHumanEvaluationCreateButton
  — cache the found button inside the expect.poll callback so the second
  getVisibleButtonByLabels call (which could race a re-render) is eliminated
- prompt-registry/index.ts: poll for ANY visible published revision row instead
  of hardcoding revisions[0].id — the completion app accumulates revisions
  across runs; with virtual scrolling the first revision from the API response
  may not be in the DOM viewport
- members/index.ts: remove waitForLoadState("networkidle") from invitePendingMember
  and WEB-ACC-MEMBERS-002 — pages with polling never reach networkidle, consuming
  up to 30 s of the 60 s budget; wait directly for the email input which is the
  correct signal that the dynamic() InviteUsersModal chunk has rendered; raise
  timeout to 90 s for tests 003/004 that run the full invite flow as setup
- deployment/index.ts: skip skeleton rows in row selector, extend row timeout
- observability/index.ts: increase test timeouts to 300 s, improve polling loop
- playground/index.ts: wait for Compare button to be enabled before clicking
- apiHelpers/index.ts: exclude evaluator apps when resolving completion app type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 27, 2026 5:18pm

Request Review

@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. tests labels May 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 46685dbc-b0e9-4560-b8d9-d347952dd406

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR bumps the project version from 0.100.3 to 0.100.4 across all packages (backend services, Python clients/SDKs, Kubernetes Helm chart, and web packages), alongside targeted Playwright test flakiness fixes to replace timing-based waits with explicit UI readiness checks, and updates the app type filter to exclude evaluator apps from the completion type.

Changes

Release 0.100.4 and Stabilization

Layer / File(s) Summary
Version bump across all packages
api/pyproject.toml, clients/python/pyproject.toml, hosting/kubernetes/helm/Chart.yaml, sdks/python/pyproject.toml, services/pyproject.toml, web/ee/package.json, web/oss/package.json, web/package.json, web/packages/agenta-api-client/package.json
Project and package versions updated from 0.100.3 to 0.100.4 across Python backend services, client libraries, SDKs, Kubernetes deployment manifests, and all web packages.
EE Members invite flow UI readiness and timeout improvements
web/ee/tests/playwright/acceptance/members/index.ts
Replaced networkidle-based waits with explicit checks for Invite Members button visibility and email input readiness; increased test timeouts from 60s to 90s for resend and removal scenarios to accommodate modal interactions.
OSS test Popover and dialog interaction stability
web/oss/tests/playwright/acceptance/app/test.ts, web/oss/tests/playwright/acceptance/deployment/index.ts
Added forced click to bypass Popover re-rendering stability checks in app creation drawer; filtered skeleton rows from deployment dialog and extended row visibility timeout to 30s.
OSS test async polling and timeout improvements
web/oss/tests/playwright/acceptance/human-annotation/tests.ts, web/oss/tests/playwright/acceptance/observability/index.ts
Refactored human evaluation button polling to cache the button within expect.poll callback; replaced single long trace-row wait with bounded polling that triggers manual Refresh Data every 20s; increased observability test timeouts from 180s to 300s.
OSS test UI readiness and interactability verification
web/oss/tests/playwright/acceptance/playground/index.ts, web/oss/tests/playwright/acceptance/prompt-registry/index.ts
Updated playground test to assert Compare button is enabled before clicking and set 120s timeout; refactored prompt registry to poll for visible published revision rows matching API-provided IDs, handling virtual scrolling accumulation across test runs.
Completion type filter evaluator app exclusion
web/tests/tests/fixtures/base.fixture/apiHelpers/index.ts
Updated getApp filter to exclude evaluator apps from completion type selection in addition to existing chat and custom app exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Agenta-AI/agenta#4328: Performs similar version metadata bumps across shared packaging files.
  • Agenta-AI/agenta#4308: Also modifies the EE Members Playwright acceptance test to improve invite flow determinism by replacing timing-based waits.
  • Agenta-AI/agenta#4301: Performs release-style version bumps across the same backend and web package manifests.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: resolving flaky acceptance tests across multiple test files (app creation, human eval, prompt registry, members invite) with specific techniques.
Description check ✅ Passed The description is clearly related to the changeset, providing detailed explanations of test fixes and the rationale behind each change (re-render stability, TOCTOU races, virtual scrolling, WebSocket polling issues).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flaky-acceptance-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bekossy bekossy changed the base branch from main to release/v0.100.4 May 27, 2026 11:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Railway Preview Environment

Status Destroyed (PR converted to draft)

Updated at 2026-05-27T17:27:04.579Z

@bekossy bekossy marked this pull request as draft May 27, 2026 17:26
@bekossy bekossy marked this pull request as ready for review May 27, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS This PR changes 0-9 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant