diff --git a/AGENTS.md b/AGENTS.md index a1ca9cd5c..a526d5651 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -167,6 +167,8 @@ Command-only flags (like `find --first`) that do not flow to the platform layer - Do not add ad-hoc stderr/file logging where diagnostics helpers apply. - Normalize user-facing failures via `src/utils/errors.ts` (`normalizeError`). - Failure payload contract: `code`, `message`, `hint`, `diagnosticId`, `logPath`, `details`. +- User-facing errors should be short and actionable: say what failed, why when known, and how to recover. Put recovery steps in `hint` when the action is not obvious, for example restart/retry, use plain screenshot when AX state is unavailable, navigate with coordinates, or inspect logs. +- If an interaction unexpectedly takes 5+ seconds, inspect the relevant daemon log before attributing it to the app. Check the session `--state-dir` `daemon.log` or the failure `logPath` for runner restart, stale session recovery, AX failure, transport retry, or command timeout evidence. - Preserve `hint`, `diagnosticId`, `logPath` when wrapping/rethrowing errors. - `--debug` is canonical; `--verbose` is backward-compatible alias. - Keep redaction centralized in diagnostics helpers. diff --git a/CONTEXT.md b/CONTEXT.md index 56e6a281d..020b2307d 100644 --- a/CONTEXT.md +++ b/CONTEXT.md @@ -16,6 +16,8 @@ - Command surface: catalog of public command identity, interface exposure, adapter policy, and shared command metadata across CLI, Node.js, MCP, and batch entrypoints. - Daemon command registry: daemon-side source of truth for command route ownership and request-policy traits, including admission exemptions, session locking, selector validation, replay-scoped actions, recording invalidation, Android dialog guards, and request provider device resolution. - Runner command traits: the iOS XCTest runner's per-command-type classification across three independent axes — interaction (gates the foreground-guard and stabilization preflight), read-only (gates the session-invalidating retry; the alert command is read-only only for its `get` action), and runner-lifecycle (skips the app-activation preflight). One source of truth keyed by command type, distinct from the public command surface and daemon command registry. +- Coordinate-first resolved element activation: iOS/macOS runner interaction pattern where a selector or text query resolves the semantic `XCUIElement`, then activation uses the element's resolved center coordinate when a frame is available. This keeps target selection semantic while avoiding `XCUIElement.tap()` post-action element re-resolution after normal navigation. tvOS remains focus/remote-driven. +- AX-unavailable target invalidation: iOS/macOS runner behavior where a root accessibility snapshot failure such as `kAXErrorIllegalArgument` marks the cached `XCUIApplication` target handle suspect. The runner fails closed for degraded interactive snapshots, clears the cached target, and lets the next command reacquire the app through normal activation. ## Testing Principles diff --git a/docs/adr/0005-ios-runner-interaction-lifecycle.md b/docs/adr/0005-ios-runner-interaction-lifecycle.md new file mode 100644 index 000000000..59e4f27a3 --- /dev/null +++ b/docs/adr/0005-ios-runner-interaction-lifecycle.md @@ -0,0 +1,76 @@ +# ADR 0005: iOS Runner Interaction Lifecycle + +## Status + +Accepted + +## Context + +The iOS runner is a long-lived XCTest process with an HTTP command loop. A command can appear to +complete at the daemon boundary while XCTest is already tearing down the test runner. + +This was reproduced in the React Navigation playground with navigation-causing selector taps such +as `Navigate to Details` and `Back to home`. The runner resolved the button and synthesized the tap, +the app navigated, and then XCTest tried to re-resolve the original `XCUIElement`. Because the +element had disappeared, xcodebuild recorded `Failed to get matching snapshot` and ended the test +with `** TEST EXECUTE FAILED **`. The daemon had already received a successful tap response, so the +next read-only command inherited a stale cached runner. + +Two older assumptions were wrong: + +- A recent successful runner response proves the runner is still healthy. +- `XCUIElement.tap()` is the safest selector-tap primitive once a selector has resolved. +- A cached `XCUIApplication` target remains safe after XCTest reports that the app's accessibility + tree cannot be serialized. + +## Decision + +Coordinate-first resolved element activation is the iOS/macOS selector-tap model. The runner still +uses selectors or text queries to find the semantic `XCUIElement`, but when the element has a frame, +activation taps the resolved center point instead of calling `XCUIElement.tap()`. tvOS remains +focus/remote-driven because tvOS does not support normal coordinate input. + +Ready runner sessions are probed with a short `uptime` preflight before command send. The daemon +does not keep or consult a "recent success" health cache. Read-only startup commands still skip that +preflight because the first successful command is the readiness proof for a newly launched runner. +Readiness probe commands skip preflight to avoid recursion. + +`uptime` is a direct runner listener probe. It is answered before command journaling, the serial +command execution queue, app activation, and main-thread XCTest dispatch. It should measure only +whether the runner is alive and accepting new HTTP requests. + +Dead cached runner processes are invalidated without graceful `shutdown`. A process that already +stopped cannot answer the shutdown request, so graceful cleanup only adds stale-listener delay. + +When XCTest reports a root accessibility snapshot failure such as `kAXErrorIllegalArgument`, the +runner treats the cached app target as suspect. Interactive snapshots fail closed to a truncated +root-only payload instead of issuing more flat fallback queries against the same broken tree, and +the cached `XCUIApplication` handle is cleared so the next command reacquires the target through the +normal activation path. + +The snapshot surface intentionally has two AX-failure shapes. Interactive fast snapshots return a +truncated success payload with `runnerFatal` so agents can still see that AX state is unavailable +and recover with a plain screenshot plus coordinate navigation. Raw or strict snapshot paths keep +returning an error because those callers requested a faithful tree, not a lossy recovery payload. + +## Consequences + +Navigation-causing selector taps no longer couple command success to XCTest's post-tap element +bookkeeping. If the target disappears because navigation happened, the tap remains a normal +successful interaction and the runner should stay alive. + +If xcodebuild still exits for another reason, the next command detects the stale runner through +process/liveness checks and avoids the old 15-second graceful-shutdown wait. The remaining latency is +fresh xcodebuild runner startup, not a stale transport stall. + +The daemon no longer models recent success as a runner-health signal. That adds one cheap `uptime` +request before ready-session commands, but it removes a false health signal that was observed to be +unsafe. + +Apps with broken accessibility trees may still be impossible for XCTest to inspect deeply, but one +failed snapshot no longer teaches the runner to keep using a suspect cached app target or to amplify +the failure by walking every interactive element query. + +Future optimization work should only reduce these preflights after the runner exposes status in a +way that survives command-induced XCTest teardown and can prove the session is still serving new +requests. diff --git a/docs/ios-runner-protocol-optimizations.md b/docs/ios-runner-protocol-optimizations.md index 82e520436..6c07f6e75 100644 --- a/docs/ios-runner-protocol-optimizations.md +++ b/docs/ios-runner-protocol-optimizations.md @@ -41,15 +41,18 @@ iOS simulator validation: ### 2. Adaptive `uptime` preflight policy -Goal: stop paying eager `uptime` before low-risk mutating commands when the runner has recently -completed a command, relying on status-before-invalidate recovery for the rare ambiguous transport -failure. +Status: superseded by ADR 0005 for ready-session command execution. + +Goal: reduce unnecessary readiness probes only when another health signal proves the runner is still +serving new requests. A recent successful command response is not sufficient proof: React Navigation +dogfood showed XCTest can return a successful tap response and then immediately fail the test runner +while re-resolving a navigation-disappeared element. Acceptance criteria: - Existing first-command/startup readiness behavior is preserved. - Existing failed-preflight stale-session recovery is preserved. -- Repeated hot interactions skip `uptime` when the runner has a recent successful response. +- Repeated hot interactions do not skip `uptime` based on cached recent-success state. - Commands that still need conservative readiness checks remain preflighted until measured. - A transport failure after skipping preflight runs status recovery before invalidation. - Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight. @@ -58,9 +61,8 @@ iOS simulator validation: - Start a fresh simulator session and run one interaction: verify the first mutating command still preflights. -- Run a hot loop of repeated selector interactions against the same visible control: verify only - the first command pays `uptime`, subsequent commands emit `ios_runner_readiness_preflight_skipped`, - and the UI still responds correctly. +- Run a hot loop of repeated selector interactions against the same visible control: verify the + runner remains healthy and diagnostics explain any readiness probe that was skipped. - Compare median command latency for a hot interaction loop before and after the change. A useful threshold is at least one fewer runner request per hot command and no increase in failure rate. diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.h b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.h index 1d536f514..a16c11223 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.h +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.h @@ -21,6 +21,10 @@ NS_ASSUME_NONNULL_BEGIN y2:(double)y2 durationMs:(double)durationMs; ++ (NSString * _Nullable)synthesizeTapWithApplication:(id)application + x:(double)x + y:(double)y; + @end NS_ASSUME_NONNULL_END diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.m b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.m index d66277076..7f5aedf7e 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.m +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerSynthesizedGesture.m @@ -53,6 +53,10 @@ static id RunnerSwipePointerPath( CGPoint end, double durationMs ); +static id RunnerTapPointerPath( + const RunnerXCTestEventBridge *bridge, + CGPoint point +); static CGPoint RunnerPointerPointAt( double x, double y, @@ -115,6 +119,18 @@ + (NSString * _Nullable)synthesizeSwipeWithApplication:(id)application } } ++ (NSString * _Nullable)synthesizeTapWithApplication:(id)application + x:(double)x + y:(double)y { + @try { + return [self trySynthesizeTapWithApplication:application x:x y:y]; + } @catch (NSException *exception) { + NSString *name = exception.name ?: @"NSException"; + NSString *reason = exception.reason ?: @"private XCTest event synthesis failed"; + return [NSString stringWithFormat:@"%@: %@", name, reason]; + } +} + + (NSString * _Nullable)trySynthesizeTransformWithApplication:(id)application x:(double)x y:(double)y @@ -224,6 +240,48 @@ + (NSString * _Nullable)trySynthesizeSwipeWithApplication:(id)application return nil; } ++ (NSString * _Nullable)trySynthesizeTapWithApplication:(id)application + x:(double)x + y:(double)y { + RunnerXCTestEventBridge bridge; + NSString *missing = RunnerResolveXCTestEventBridge(application, &bridge); + if (missing != nil) { + return missing; + } + + NSInteger interfaceOrientation = + ((RunnerMsgSendInteger)objc_msgSend)(application, bridge.interfaceOrientationSelector); + NSInteger targetProcessID = ((RunnerMsgSendInteger)objc_msgSend)(application, bridge.processIDSelector); + if (targetProcessID <= 0) { + return @"private XCTest event synthesis unavailable: could not resolve target process ID"; + } + + id record = ((RunnerMsgSendInitRecord)objc_msgSend)( + [bridge.recordClass alloc], + bridge.initRecordSelector, + @"agent-device-tap", + interfaceOrientation + ); + if (record == nil) { + return @"private XCTest event synthesis failed: could not create event record"; + } + ((RunnerMsgSendSetInteger)objc_msgSend)(record, bridge.setTargetProcessIDSelector, targetProcessID); + + id path = RunnerTapPointerPath(&bridge, CGPointMake(x, y)); + if (path == nil) { + return @"private XCTest event synthesis failed: could not create pointer path"; + } + ((RunnerMsgSendAddPath)objc_msgSend)(record, bridge.addPathSelector, path); + + NSError *error = nil; + BOOL ok = ((RunnerMsgSendSynthesize)objc_msgSend)(record, bridge.synthesizeSelector, &error); + if (!ok) { + NSString *detail = error.localizedDescription ?: @"synthesizeWithError returned false"; + return [NSString stringWithFormat:@"private XCTest event synthesis failed: %@", detail]; + } + return nil; +} + static NSString * _Nullable RunnerResolveXCTestEventBridge( id application, RunnerXCTestEventBridge *bridge @@ -368,6 +426,19 @@ static id RunnerSwipePointerPath( return path; } +static id RunnerTapPointerPath( + const RunnerXCTestEventBridge *bridge, + CGPoint point +) { + id path = + ((RunnerMsgSendInitPath)objc_msgSend)([bridge->pathClass alloc], bridge->initPathSelector, point, 0.0); + if (path == nil) { + return nil; + } + ((RunnerMsgSendPathOffset)objc_msgSend)(path, bridge->liftSelector, 0.05); + return path; +} + static CGPoint RunnerPointerPointAt( double x, double y, diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift index 8f2e9d3f6..ede603478 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandExecution.swift @@ -147,10 +147,64 @@ extension RunnerTests { return Response(ok: true, data: data) } + func testGestureResponseIncludesSynthesizedTapFallbackDiagnostics() { + let response = gestureResponse( + message: "tapped", + timing: (gestureStartUptimeMs: 1, gestureEndUptimeMs: 2), + fallback: GestureFallback( + strategy: "xctest-coordinate-tap", + message: "Runner synthesized coordinate tap is unavailable", + hint: "Using XCTest coordinate tap fallback." + ) + ) + + XCTAssertEqual(response.ok, true) + XCTAssertEqual(response.data?.gestureFallback, "xctest-coordinate-tap") + XCTAssertEqual( + response.data?.gestureFallbackMessage, + "Runner synthesized coordinate tap is unavailable" + ) + XCTAssertEqual(response.data?.gestureFallbackHint, "Using XCTest coordinate tap fallback.") + } + + func testXCTestRecordedFailureResponseFailsMutatingSuccesses() throws { + let command = try runnerCommandFixture(#"{"command":"tap","commandId":"tap-1"}"#) + let response = Response(ok: true, data: DataPayload(message: "tapped")) + + let failureResponse = xctestRecordedFailureResponse(command: command, response: response) + + XCTAssertEqual(failureResponse?.ok, false) + XCTAssertEqual(failureResponse?.error?.code, "XCTEST_RECORDED_FAILURE") + XCTAssertEqual( + failureResponse?.error?.message, + "XCTest recorded a failure while executing tap; the action may not have been performed." + ) + } + + func testXCTestRecordedFailureResponseDoesNotWrapReadOnlyOrRunnerFatalResponses() throws { + let snapshotCommand = try runnerCommandFixture(#"{"command":"snapshot","commandId":"snapshot-1"}"#) + let tapCommand = try runnerCommandFixture(#"{"command":"tap","commandId":"tap-1"}"#) + let runnerFatalResponse = Response( + ok: true, + data: DataPayload(runnerFatal: true, runnerFatalReason: "ax_snapshot_unavailable") + ) + + XCTAssertNil( + xctestRecordedFailureResponse( + command: snapshotCommand, + response: Response(ok: true, data: DataPayload(nodes: [], truncated: false)) + ) + ) + XCTAssertNil(xctestRecordedFailureResponse(command: tapCommand, response: runnerFatalResponse)) + } + func execute(command: Command) throws -> Response { if command.command == .status { return executeStatus(command: command) } + if command.command == .uptime { + return executeUptime() + } commandJournal.accept(command: command) return try executeAccepted(command: command) } @@ -185,6 +239,13 @@ extension RunnerTests { return Response(ok: true, data: commandJournal.status(commandId: statusCommandId)) } + func executeUptime() -> Response { + Response( + ok: true, + data: DataPayload(currentUptimeMs: currentUptimeMs()) + ) + } + private func executeDispatched(command: Command) throws -> Response { if Thread.isMainThread { return try executeOnMainSafely(command: command) @@ -229,6 +290,7 @@ extension RunnerTests { while true { var response: Response? var swiftError: Error? + let failureCountBefore = currentXCTestFailureCount() let exceptionMessage = RunnerObjCExceptionCatcher.catchException({ do { response = try self.executeOnMain(command: command) @@ -238,8 +300,7 @@ extension RunnerTests { }) if let exceptionMessage { - currentApp = nil - currentBundleId = nil + invalidateCachedTarget(reason: "objc_exception") if !hasRetried, shouldRetryException(command, message: exceptionMessage) { NSLog( "AGENT_DEVICE_RUNNER_RETRY command=%@ reason=objc_exception", @@ -265,14 +326,19 @@ extension RunnerTests { userInfo: [NSLocalizedDescriptionKey: "command returned no response"] ) } + if didRecordXCTestFailure(since: failureCountBefore), + let failureResponse = xctestRecordedFailureResponse(command: command, response: response) + { + invalidateCachedTarget(reason: "xctest_recorded_failure") + return failureResponse + } if !hasRetried, shouldRetryCommand(command), shouldRetryResponse(response) { NSLog( "AGENT_DEVICE_RUNNER_RETRY command=%@ reason=response_unavailable", command.command.rawValue ) hasRetried = true - currentApp = nil - currentBundleId = nil + invalidateCachedTarget(reason: "response_unavailable") sleepFor(retryCooldown) continue } @@ -282,7 +348,9 @@ extension RunnerTests { private func executeOnMain(command: Command) throws -> Response { var activeApp = currentApp ?? app - if !isRunnerLifecycleCommand(command.command) { + if shouldSkipAppActivationPreflight(command) { + activeApp = resolveAppWithoutActivation(command: command) + } else if !isRunnerLifecycleCommand(command.command) { let normalizedBundleId = command.appBundleId? .trimmingCharacters(in: .whitespacesAndNewlines) let requestedBundleId = (normalizedBundleId?.isEmpty == true) ? nil : normalizedBundleId @@ -408,10 +476,7 @@ extension RunnerTests { return Response(ok: false, error: ErrorPayload(message: "failed to stop recording: \(error.localizedDescription)")) } case .uptime: - return Response( - ok: true, - data: DataPayload(currentUptimeMs: currentUptimeMs()) - ) + return executeUptime() case .tap: if let selectorKey = command.selectorKey, let selectorValue = command.selectorValue { let match = findElement( @@ -425,6 +490,7 @@ extension RunnerTests { } if let element = match.element { let frame = element.frame + let isTextEntry = isTextEntryElement(element) let touchFrame = frame.isEmpty ? nil : resolvedTouchVisualizationFrame(app: activeApp, x: frame.midX, y: frame.midY) @@ -440,7 +506,9 @@ extension RunnerTests { if let response = unsupportedResponse(for: outcome) { return response } - waitForTextEntryReadinessAfterTap(app: activeApp, element: element) + if isTextEntry { + waitForTextEntryReadinessAfterTap(app: activeApp, element: element) + } return gestureResponse( message: match.usedNonHittableFallback ? "tapped via non-hittable coordinate fallback" : "tapped", timing: timing, @@ -462,12 +530,27 @@ extension RunnerTests { return Response(ok: false, error: ErrorPayload(message: "element not found")) } if let x = command.x, let y = command.y { + var fallback: GestureFallback? + if command.synthesized == true { + let (timing, outcome) = performGesture(activeApp, idleTimeout: false) { + synthesizedTapAt(app: activeApp, x: x, y: y) + } + if case .performed = outcome { + return gestureResponse(message: "tapped", timing: timing) + } + fallback = gestureFallback(strategy: "xctest-coordinate-tap", from: outcome) + } let touchFrame = resolvedTouchVisualizationFrame(app: activeApp, x: x, y: y) let (timing, outcome) = performGesture(activeApp) { tapAt(app: activeApp, x: x, y: y) } if let response = unsupportedResponse(for: outcome) { return response } - return gestureResponse(message: "tapped", timing: timing, frame: .touch(touchFrame)) + return gestureResponse( + message: "tapped", + timing: timing, + frame: .touch(touchFrame), + fallback: fallback + ) } return Response(ok: false, error: ErrorPayload(message: "tap requires text or x/y")) case .mouseClick: @@ -736,6 +819,7 @@ extension RunnerTests { needsPostSnapshotInteractionDelay = true return Response(ok: true, data: payload) } catch let failure as SnapshotCaptureFailure { + invalidateCachedTarget(reason: "ax_snapshot_failure") // Other thrown errors fall through to executeOnMainSafely's generic error response. return Response( ok: false, @@ -935,6 +1019,65 @@ extension RunnerTests { } } + private func currentXCTestFailureCount() -> Int { + return testRun?.failureCount ?? 0 + } + + private func didRecordXCTestFailure(since failureCountBefore: Int) -> Bool { + return currentXCTestFailureCount() > failureCountBefore + } + + private func xctestRecordedFailureResponse(command: Command, response: Response) -> Response? { + guard response.ok else { return nil } + if response.data?.runnerFatal == true { + return nil + } + guard !isReadOnlyCommand(command), !isRunnerLifecycleCommand(command.command) else { + return nil + } + return Response( + ok: false, + error: ErrorPayload( + code: "XCTEST_RECORDED_FAILURE", + message: "XCTest recorded a failure while executing \(command.command.rawValue); the action may not have been performed.", + hint: "The iOS runner session will be restarted. Retry after a fresh snapshot, or use screenshot plus coordinate commands when the accessibility tree is unavailable." + ) + ) + } + + private func runnerCommandFixture(_ json: String) throws -> Command { + try JSONDecoder().decode(Command.self, from: Data(json.utf8)) + } + + private func shouldSkipAppActivationPreflight(_ command: Command) -> Bool { +#if os(iOS) + // Coordinate-only synthesized taps can run after an AX-fatal screen because they do not need + // app activation, window lookup, keyboard lookup, or element resolution. Selector/text taps + // intentionally stay on the normal AX path because they need an element query. + return command.command == .tap + && command.synthesized == true + && command.x != nil + && command.y != nil + && command.text == nil + && command.selectorKey == nil +#else + return false +#endif + } + + private func resolveAppWithoutActivation(command: Command) -> XCUIApplication { + guard let bundleId = command.appBundleId? + .trimmingCharacters(in: .whitespacesAndNewlines), + !bundleId.isEmpty + else { + return currentApp ?? app + } + if currentBundleId == bundleId, let currentApp { + return currentApp + } + return XCUIApplication(bundleIdentifier: bundleId) + } + private func executeTypeCommand(activeApp: XCUIApplication, command: Command) -> Response { guard let text = command.text else { return Response(ok: false, error: ErrorPayload(message: "type requires text")) diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandJournal.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandJournal.swift index 16faa4ccc..dee4606fd 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandJournal.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+CommandJournal.swift @@ -154,6 +154,17 @@ final class RunnerCommandJournal { } extension RunnerTests { + func testUptimeBypassesCommandJournal() throws { + let command = runnerJournalCommand("uptime", id: "uptime-probe") + + let response = try execute(command: command) + let status = commandJournal.status(commandId: "uptime-probe") + + XCTAssertEqual(response.ok, true) + XCTAssertNotNil(response.data?.currentUptimeMs) + XCTAssertEqual(status.lifecycleState, RunnerCommandLifecycleState.notAccepted.rawValue) + } + func testCommandJournalRetentionPolicy() throws { let journal = RunnerCommandJournal() diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Exceptions.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Exceptions.swift index 76d47e43c..d12648548 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Exceptions.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Exceptions.swift @@ -10,10 +10,7 @@ extension RunnerTests { /// exception telemetry later. `RunnerObjCExceptionCatcher.catchException` takes a non-escaping /// block, so `block` may capture `inout` state. func safely(_ tag: String, _ fallback: T, _ block: () -> T) -> T { - var result = fallback - let exceptionMessage = RunnerObjCExceptionCatcher.catchException({ - result = block() - }) + let (result, exceptionMessage) = catchingObjCException(fallback: fallback, block) if let exceptionMessage { NSLog("AGENT_DEVICE_RUNNER_%@_IGNORED_EXCEPTION=%@", tag, exceptionMessage) return fallback @@ -21,6 +18,17 @@ extension RunnerTests { return result } + func catchingObjCException( + fallback: T, + _ block: () -> T + ) -> (result: T, exceptionMessage: String?) { + var result = fallback + let exceptionMessage = RunnerObjCExceptionCatcher.catchException({ + result = block() + }) + return (result, exceptionMessage) + } + /// Optional-returning convenience: returns `nil` on exception (matching the common /// `var x: T?` + catch-and-return-nil shape). func safely(_ tag: String, _ block: () -> T?) -> T? { diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift index b59a36228..31c7cdd5f 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift @@ -670,6 +670,32 @@ extension RunnerTests { #endif } + func synthesizedTapAt(app: XCUIApplication, x: Double, y: Double) -> RunnerInteractionOutcome { +#if os(iOS) + if let message = RunnerSynthesizedGesture.synthesizeTap( + withApplication: app, + x: x, + y: y + ) { + return .unsupported( + message: message, + hint: "Falling back to XCTest coordinate tap may be slower and can still need a healthy accessibility tree." + ) + } + return .performed +#elseif os(tvOS) + return .unsupported( + message: "coordinate tap is not supported on tvOS; move focus with swipe or scroll, then select the focused element", + hint: "tvOS has no coordinate input; move focus with swipe/scroll to the target, then select it." + ) +#else + return .unsupported( + message: "synthesized coordinate tap is not supported on macOS", + hint: "macOS automation has no touchscreen; use mouse-driven interactions instead." + ) +#endif + } + func keyboardAvoidingDragPoints( app: XCUIApplication, x: Double, diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift index 5ad33a75b..8c73d1c11 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Lifecycle.swift @@ -87,6 +87,14 @@ extension RunnerTests { currentBundleId = nil } + func invalidateCachedTarget(reason: String) { + if currentApp != nil || currentBundleId != nil { + NSLog("AGENT_DEVICE_RUNNER_TARGET_CACHE_INVALIDATE reason=%@", reason) + } + currentApp = nil + currentBundleId = nil + } + func targetNeedsActivation(_ target: XCUIApplication) -> Bool { let state = target.state #if os(macOS) diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift index b835b853a..e1ecdcf9b 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Models.swift @@ -193,6 +193,8 @@ struct DataPayload: Codable { let gestureFallback: String? let gestureFallbackMessage: String? let gestureFallbackHint: String? + let runnerFatal: Bool? + let runnerFatalReason: String? init( message: String? = nil, @@ -224,7 +226,9 @@ struct DataPayload: Codable { orientation: String? = nil, gestureFallback: String? = nil, gestureFallbackMessage: String? = nil, - gestureFallbackHint: String? = nil + gestureFallbackHint: String? = nil, + runnerFatal: Bool? = nil, + runnerFatalReason: String? = nil ) { self.message = message self.text = text @@ -256,6 +260,8 @@ struct DataPayload: Codable { self.gestureFallback = gestureFallback self.gestureFallbackMessage = gestureFallbackMessage self.gestureFallbackHint = gestureFallbackHint + self.runnerFatal = runnerFatal + self.runnerFatalReason = runnerFatalReason } } diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Snapshot.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Snapshot.swift index 58ae222b5..dd0cb42a8 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Snapshot.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Snapshot.swift @@ -3,7 +3,7 @@ import XCTest extension RunnerTests { private static let axSnapshotErrorCode = "IOS_AX_SNAPSHOT_FAILED" private static let axSnapshotHint = - "XCTest could not serialize this iOS accessibility tree. Try a smaller read such as snapshot -s -d 8, use direct selector commands such as find id click, or use screenshot/logs/appstate in the same session. If you own the app and need full-tree inspection, consider flagging this screen for accessibility-tree simplification: reduce unnecessary accessible wrapper nesting and expose stable ids on actionable controls." + "Snapshot state is unavailable because XCTest could not serialize this iOS accessibility tree. This can be specific to the current screen. Use plain screenshot, not screenshot --overlay-refs, as visual truth; navigate with coordinate commands if needed; then retry snapshot -i after reaching another screen. If you own the app and need full-tree inspection, simplify this screen's accessibility tree and expose stable ids on actionable controls." private static let collapsedTabCandidateTypes: Set = [ .button, .link, @@ -97,8 +97,7 @@ extension RunnerTests { do { context = try makeSnapshotTraversalContext(app: app, options: options) } catch let failure as SnapshotCaptureFailure where options.interactiveOnly { - NSLog("AGENT_DEVICE_RUNNER_SNAPSHOT_FLAT_FALLBACK=%@", failure.message) - return snapshotFlatInteractive(app: app, options: options) + return snapshotAccessibilityUnavailable(failure: failure) } guard let context else { @@ -110,11 +109,11 @@ extension RunnerTests { if let cachedDescendantElements { return cachedDescendantElements } - let fetched = safeSnapshotElementsQuery { + let result = snapshotElementsQuery { context.queryRoot.descendants(matching: .any).allElementsBoundByIndex } - cachedDescendantElements = fetched - return fetched + cachedDescendantElements = result.elements + return result.elements } var nodes: [SnapshotNode] = [] @@ -328,6 +327,40 @@ extension RunnerTests { return DataPayload(nodes: nodes, truncated: truncated) } + private func snapshotAccessibilityUnavailable(failure: SnapshotCaptureFailure) -> DataPayload { + NSLog("AGENT_DEVICE_RUNNER_SNAPSHOT_AX_UNAVAILABLE=%@", failure.message) + invalidateCachedTarget(reason: "ax_snapshot_unavailable") + return DataPayload( + message: failure.message, + nodes: [compactInteractiveRootNode(rect: .zero)], + truncated: true, + runnerFatal: true, + runnerFatalReason: "ax_snapshot_unavailable" + ) + } + + func testSnapshotAccessibilityUnavailableMarksSparseSnapshotRunnerFatal() { + currentApp = app + currentBundleId = "com.example.app" + + let payload = snapshotAccessibilityUnavailable( + failure: SnapshotCaptureFailure( + code: "IOS_AX_SNAPSHOT_FAILED", + message: "iOS XCTest snapshot failed while serializing the accessibility tree.", + hint: Self.axSnapshotHint + ) + ) + + XCTAssertEqual(payload.message, "iOS XCTest snapshot failed while serializing the accessibility tree.") + XCTAssertEqual(payload.nodes?.count, 1) + XCTAssertEqual(payload.nodes?.first?.type, "Application") + XCTAssertEqual(payload.truncated, true) + XCTAssertEqual(payload.runnerFatal, true) + XCTAssertEqual(payload.runnerFatalReason, "ax_snapshot_unavailable") + XCTAssertNil(currentApp) + XCTAssertNil(currentBundleId) + } + private func compactInteractiveRootNode(rect: CGRect) -> SnapshotNode { SnapshotNode( index: 0, @@ -818,10 +851,6 @@ extension RunnerTests { return containerLabel == label && containerIdentifier == identifier } - private func safeSnapshotElementsQuery(_ fetch: () -> [XCUIElement]) -> [XCUIElement] { - safely("SNAPSHOT_QUERY", [], fetch) - } - private func flatInteractiveElements( app: XCUIApplication, deadline: Date @@ -852,13 +881,32 @@ extension RunnerTests { NSLog("AGENT_DEVICE_RUNNER_SNAPSHOT_FLAT_FALLBACK_DEADLINE") break } - elements.append(contentsOf: safeSnapshotElementsQuery { + let result = snapshotElementsQuery { query.allElementsBoundByIndex - }) + } + elements.append(contentsOf: result.elements) + if result.axUnavailable { + break + } } return elements } + private func snapshotElementsQuery( + _ fetch: () -> [XCUIElement] + ) -> (elements: [XCUIElement], axUnavailable: Bool) { + let (elements, exceptionMessage) = catchingObjCException(fallback: [], fetch) + guard let exceptionMessage else { + return (elements, false) + } + NSLog("AGENT_DEVICE_RUNNER_SNAPSHOT_QUERY_IGNORED_EXCEPTION=%@", exceptionMessage) + if Self.isAxIllegalArgument(exceptionMessage) { + invalidateCachedTarget(reason: "ax_snapshot_query_unavailable") + return ([], true) + } + return ([], false) + } + private func flatSnapshotNode( element: XCUIElement, index: Int, diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Transport.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Transport.swift index 2092bb438..1216272f7 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Transport.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Transport.swift @@ -112,6 +112,10 @@ extension RunnerTests { completion((jsonResponse(status: 200, response: executeStatus(command: command)), false)) return } + if command.command == .uptime { + completion((jsonResponse(status: 200, response: executeUptime()), false)) + return + } commandJournal.accept(command: command) commandExecutionQueue.async { do { diff --git a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+TvRemote.swift b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+TvRemote.swift index 9e1edaed5..12e879975 100644 --- a/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+TvRemote.swift +++ b/ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+TvRemote.swift @@ -81,7 +81,18 @@ extension RunnerTests { if let outcome = selectFocusedTvElement(app: app, element: element, action: action) { return outcome } +#if os(tvOS) return performElementTap(element) +#else + let frame = element.frame + if !frame.isEmpty { + // XCUIElement.tap() can fail the whole XCTest after navigation because it + // re-resolves the tapped element even after the app removed it. Keep the + // selector target semantic, then activate its resolved stable screen point. + return tapAt(app: app, x: frame.midX, y: frame.midY) + } + return performElementTap(element) +#endif } func selectFocusedTvElement(app: XCUIApplication, point: CGPoint, action: String) -> RunnerInteractionOutcome? { diff --git a/package.json b/package.json index 3f444f71e..2fce95503 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "agent-device", - "version": "0.16.13", + "version": "0.16.14", "description": "Agent-native CLI for AI mobile testing and app automation across iOS, Android, tvOS, Android TV, macOS, and Linux.", "mcpName": "io.github.callstackincubator/agent-device", "license": "MIT", diff --git a/server.json b/server.json index d6ea6cfd7..bd54c13cb 100644 --- a/server.json +++ b/server.json @@ -7,12 +7,12 @@ "url": "https://github.com/callstackincubator/agent-device", "source": "github" }, - "version": "0.16.13", + "version": "0.16.14", "packages": [ { "registryType": "npm", "identifier": "agent-device", - "version": "0.16.13", + "version": "0.16.14", "transport": { "type": "stdio" } diff --git a/src/platforms/ios/__tests__/index.test.ts b/src/platforms/ios/__tests__/index.test.ts index 47f2c7855..9e8ac8b60 100644 --- a/src/platforms/ios/__tests__/index.test.ts +++ b/src/platforms/ios/__tests__/index.test.ts @@ -169,6 +169,54 @@ test('iosRunnerOverrides gives fling a short default XCUITest drag hold', async }); }); +test('iosRunnerOverrides uses synthesized iOS coordinate taps', async () => { + mockRunIosRunnerCommand.mockResolvedValue({}); + + const { overrides } = iosRunnerOverrides(IOS_TEST_SIMULATOR, { + appBundleId: 'com.example.App', + }); + + await overrides.tap(100, 200); + await overrides.focus(110, 210); + + assert.deepEqual(mockRunIosRunnerCommand.mock.calls[0]?.[1], { + command: 'tap', + x: 100, + y: 200, + synthesized: true, + appBundleId: 'com.example.App', + }); + assert.deepEqual(mockRunIosRunnerCommand.mock.calls[1]?.[1], { + command: 'tap', + x: 110, + y: 210, + synthesized: true, + appBundleId: 'com.example.App', + }); +}); + +for (const [name, device] of [ + ['macOS', MACOS_TEST_DEVICE], + ['tvOS', TVOS_TEST_SIMULATOR], +] as const) { + test(`iosRunnerOverrides keeps ${name} coordinate taps on the standard path`, async () => { + mockRunIosRunnerCommand.mockResolvedValue({}); + + const { overrides } = iosRunnerOverrides(device, { + appBundleId: 'com.example.App', + }); + + await overrides.tap(100, 200); + + assert.deepEqual(mockRunIosRunnerCommand.mock.calls[0]?.[1], { + command: 'tap', + x: 100, + y: 200, + appBundleId: 'com.example.App', + }); + }); +} + test('iosRunnerOverrides maps swipe to synthesized iOS drag duration', async () => { mockRunIosRunnerCommand.mockResolvedValue({}); diff --git a/src/platforms/ios/__tests__/runner-client.test.ts b/src/platforms/ios/__tests__/runner-client.test.ts index d671bfc65..a1b3438f2 100644 --- a/src/platforms/ios/__tests__/runner-client.test.ts +++ b/src/platforms/ios/__tests__/runner-client.test.ts @@ -654,6 +654,34 @@ test('parseRunnerResponse preserves iOS AX snapshot failure code and hint', asyn ); }); +test('parseRunnerResponse preserves XCTest recorded failure code and hint', async () => { + const hint = 'The iOS runner session will be restarted.'; + const response = new Response( + JSON.stringify({ + ok: false, + error: { + code: 'XCTEST_RECORDED_FAILURE', + message: + 'XCTest recorded a failure while executing tap; the action may not have been performed.', + hint, + }, + }), + ); + const session = { ready: true }; + + await assert.rejects( + () => parseRunnerResponse(response, session, '/tmp/runner.log'), + (error: unknown) => { + assert.ok(error instanceof AppError); + assert.equal(error.code, 'XCTEST_RECORDED_FAILURE'); + assert.match(error.message, /may not have been performed/); + assert.equal(error.details?.hint, hint); + assert.equal(isRetryableRunnerError(error), false); + return true; + }, + ); +}); + test('parseRunnerResponse emits diagnostics for runner gesture fallbacks', async () => { const response = new Response( JSON.stringify({ diff --git a/src/platforms/ios/__tests__/runner-command-retry.test.ts b/src/platforms/ios/__tests__/runner-command-retry.test.ts index ee85813ef..5c085c4bc 100644 --- a/src/platforms/ios/__tests__/runner-command-retry.test.ts +++ b/src/platforms/ios/__tests__/runner-command-retry.test.ts @@ -133,7 +133,9 @@ test('prepareIosRunner retries a fresh launch session when the health check cann const stuckSession = makeRunnerSession({ port: 8100 }); const relaunchedSession = makeRunnerSession({ port: 8101 }); - mockEnsureRunnerSession.mockResolvedValueOnce(stuckSession).mockResolvedValueOnce(relaunchedSession); + mockEnsureRunnerSession + .mockResolvedValueOnce(stuckSession) + .mockResolvedValueOnce(relaunchedSession); mockExecuteRunnerCommandWithSession .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection')) .mockResolvedValueOnce({ uptimeMs: 42 }); @@ -190,7 +192,9 @@ test('prepareIosRunner does not force a rebuild when the relaunched fresh sessio xctestrunArtifact: exactArtifact, }); - mockEnsureRunnerSession.mockResolvedValueOnce(stuckSession).mockResolvedValueOnce(relaunchedSession); + mockEnsureRunnerSession + .mockResolvedValueOnce(stuckSession) + .mockResolvedValueOnce(relaunchedSession); mockExecuteRunnerCommandWithSession .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection')) .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection')); @@ -427,35 +431,6 @@ test('mutating commands recover cached responses before invalidating after comma assert.equal(statusCommand.statusCommandId, sentCommand.commandId); }); -test('mutating commands run status recovery after transport failure when readiness preflight was skipped', async () => { - const session = makeRunnerSession({ port: 8100, ready: true }); - - mockEnsureRunnerSession.mockResolvedValueOnce(session); - mockExecuteRunnerCommandWithSession - .mockRejectedValueOnce( - new AppError('COMMAND_FAILED', 'fetch failed', { - runnerReadinessPreflightSkipped: true, - runnerReadinessPreflightSkipReason: 'recent_successful_response', - }), - ) - .mockResolvedValueOnce({ - lifecycleState: 'completed', - lifecycleResponseJson: JSON.stringify({ ok: true, data: { message: 'tapped' } }), - }); - - const diagnostics = await captureDiagnostics(async () => { - const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }); - assert.deepEqual(result, { message: 'tapped' }); - }); - - assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0); - assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2); - assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status'); - assert.match(diagnostics, /ios_runner_command_status_recovery/); - assert.match(diagnostics, /"readinessPreflightSkipped":true/); - assert.match(diagnostics, /"readinessPreflightSkipReason":"recent_successful_response"/); -}); - test('mutating commands keep invalidating when status cannot find the command', async () => { const session = makeRunnerSession({ port: 8100, ready: true }); @@ -627,36 +602,6 @@ test('mutating commands report recovery guidance when completed status has no re }); }); -test('mutating commands include skipped readiness context in lost-response guidance', async () => { - const session = makeRunnerSession({ port: 8100, ready: true }); - - mockEnsureRunnerSession.mockResolvedValueOnce(session); - mockExecuteRunnerCommandWithSession - .mockRejectedValueOnce( - new AppError('COMMAND_FAILED', 'fetch failed', { - runnerReadinessPreflightSkipped: true, - runnerReadinessPreflightSkipReason: 'recent_successful_response', - runnerReadinessPreflightSkippedAgeMs: 4, - }), - ) - .mockResolvedValueOnce({ lifecycleState: 'completed' }); - - await assert.rejects( - () => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }), - (error: unknown) => { - assert.ok(error instanceof AppError); - assert.equal(error.details?.recovery, 'completed_without_retained_response'); - assert.equal(error.details?.readinessPreflightSkipped, true); - assert.equal(error.details?.readinessPreflightSkipReason, 'recent_successful_response'); - assert.equal(error.details?.readinessPreflightSkippedAgeMs, 4); - assert.match(String(error.details?.hint), /skipped the uptime preflight/); - assert.match(String(error.details?.hint), /status recovery confirmed/); - assert.match(String(error.details?.hint), /snapshot -i/); - return true; - }, - ); -}); - test('mutating commands preserve runner failure details from status recovery', async () => { const session = makeRunnerSession({ port: 8100, ready: true }); diff --git a/src/platforms/ios/__tests__/runner-session.test.ts b/src/platforms/ios/__tests__/runner-session.test.ts index e8cefbe3b..a67721187 100644 --- a/src/platforms/ios/__tests__/runner-session.test.ts +++ b/src/platforms/ios/__tests__/runner-session.test.ts @@ -154,6 +154,56 @@ test('runner session executes read-only commands without uptime preflight', asyn assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 0); }); +test('runner session probes readiness before ready read-only commands', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner + .mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })) + .mockResolvedValueOnce(runnerResponse({ nodes: [], truncated: false })); + + const result = await executeRunnerCommandWithSession( + IOS_SIMULATOR, + session, + { command: 'snapshot', appBundleId: 'com.example.demo' }, + '/tmp/runner.log', + 30_000, + ); + + assert.deepEqual(result, { nodes: [], truncated: false }); + assert.equal(mockWaitForRunner.mock.calls.length, 2); + assertRunnerCommand(mockWaitForRunner.mock.calls[0]?.[2], { command: 'uptime' }); + assert.equal(mockWaitForRunner.mock.calls[0]?.[4], 1_000); + assertRunnerCommand(mockWaitForRunner.mock.calls[1]?.[2], { + command: 'snapshot', + appBundleId: 'com.example.demo', + }); + assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 0); +}); + +test('runner session marks read-only readiness preflight failures before command send', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockRejectedValueOnce(new Error('fetch failed')); + + await assert.rejects( + () => + executeRunnerCommandWithSession( + IOS_SIMULATOR, + session, + { command: 'snapshot', appBundleId: 'com.example.demo' }, + '/tmp/runner.log', + 30_000, + ), + (error: unknown) => { + assert.ok(error instanceof AppError); + assert.equal(error.details?.runnerReadinessPreflightFailed, true); + return true; + }, + ); + + assert.equal(mockWaitForRunner.mock.calls.length, 1); + assertRunnerCommand(mockWaitForRunner.mock.calls[0]?.[2], { command: 'uptime' }); + assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 0); +}); + test('runner session executes status command as read-only lifecycle command', async () => { const session = makeRunnerSession({ ready: true }); mockWaitForRunner.mockResolvedValueOnce( @@ -234,8 +284,9 @@ test('runner session emits reason diagnostics when readiness preflight is used', assert.match(diagnostics, /ios_runner_readiness_preflight/); }); -test('runner session skips readiness preflight for tap commands after a recent successful response', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); +test('runner session probes readiness for ready tap commands', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); const result = await executeRunnerCommandWithSession( @@ -247,12 +298,15 @@ test('runner session skips readiness preflight for tap commands after a recent s ); assert.deepEqual(result, { tapped: true }); - assert.equal(mockWaitForRunner.mock.calls.length, 0); + assert.equal(mockWaitForRunner.mock.calls.length, 1); + assertRunnerCommand(mockWaitForRunner.mock.calls[0]?.[2], { command: 'uptime' }); + assert.equal(mockWaitForRunner.mock.calls[0]?.[4], 1_000); assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 1); }); -test('runner session emits explicit diagnostics when readiness preflight is skipped', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); +test('runner session emits explicit diagnostics when ready sessions are probed', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); const diagnostics = await captureDiagnostics(async () => { @@ -265,14 +319,14 @@ test('runner session emits explicit diagnostics when readiness preflight is skip ); }); - assert.match(diagnostics, /ios_runner_readiness_preflight_skipped/); - assert.match(diagnostics, /"reason":"recent_successful_response"/); - assert.doesNotMatch(diagnostics, /ios_runner_readiness_preflight_used/); + assert.match(diagnostics, /ios_runner_readiness_preflight/); + assert.match(diagnostics, /"reason":"ready_session"/); + assert.doesNotMatch(diagnostics, /ios_runner_readiness_preflight_skipped/); }); -test('runner session marks transport failures after skipped readiness preflight', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); - mockSendRunnerCommandOnce.mockRejectedValueOnce(new Error('fetch failed')); +test('runner session marks preflight failures for ready mutating commands', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockRejectedValueOnce(new Error('fetch failed')); await assert.rejects( () => @@ -285,15 +339,16 @@ test('runner session marks transport failures after skipped readiness preflight' ), (error: unknown) => { assert.ok(error instanceof AppError); - assert.equal(error.details?.runnerReadinessPreflightSkipped, true); - assert.equal(error.details?.runnerReadinessPreflightSkipReason, 'recent_successful_response'); + assert.equal(error.details?.runnerReadinessPreflightFailed, true); return true; }, ); + assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 0); }); -test('runner session does not mark runner response failures as skipped preflight transport failures', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); +test('runner session preserves runner response failures after successful readiness preflight', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce( runnerError({ code: 'COMMAND_FAILED', @@ -313,14 +368,14 @@ test('runner session does not mark runner response failures as skipped preflight (error: unknown) => { assert.ok(error instanceof AppError); assert.equal(error.message, 'Runner failed after receiving command'); - assert.equal(error.details?.runnerReadinessPreflightSkipped, undefined); return true; }, ); }); -test('runner session skips readiness preflight for selector taps after a recent successful response', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); +test('runner session probes readiness for ready selector taps', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); const result = await executeRunnerCommandWithSession( @@ -337,12 +392,15 @@ test('runner session skips readiness preflight for selector taps after a recent ); assert.deepEqual(result, { tapped: true }); - assert.equal(mockWaitForRunner.mock.calls.length, 0); + assert.equal(mockWaitForRunner.mock.calls.length, 1); + assertRunnerCommand(mockWaitForRunner.mock.calls[0]?.[2], { command: 'uptime' }); + assert.equal(mockWaitForRunner.mock.calls[0]?.[4], 1_000); assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 1); }); -test('runner session skips readiness preflight for tapSeries after a recent successful response', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); +test('runner session probes readiness for ready tapSeries commands', async () => { + const session = makeRunnerSession({ ready: true }); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); const result = await executeRunnerCommandWithSession( @@ -360,35 +418,15 @@ test('runner session skips readiness preflight for tapSeries after a recent succ 30_000, ); - assert.deepEqual(result, { tapped: true }); - assert.equal(mockWaitForRunner.mock.calls.length, 0); - assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 1); -}); - -test('runner session keeps readiness preflight for tap commands when ready but never proven fresh', async () => { - const session = makeRunnerSession({ ready: true }); - mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); - mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); - - const result = await executeRunnerCommandWithSession( - IOS_SIMULATOR, - session, - { command: 'tap', x: 120, y: 240, appBundleId: 'com.example.demo' }, - '/tmp/runner.log', - 30_000, - ); - assert.deepEqual(result, { tapped: true }); assert.equal(mockWaitForRunner.mock.calls.length, 1); assertRunnerCommand(mockWaitForRunner.mock.calls[0]?.[2], { command: 'uptime' }); + assert.equal(mockWaitForRunner.mock.calls[0]?.[4], 1_000); assert.equal(mockSendRunnerCommandOnce.mock.calls.length, 1); }); -test('runner session keeps readiness preflight for tap commands when marked ready but stale', async () => { - const session = makeRunnerSession({ - ready: true, - lastSuccessfulRunnerResponseAtMs: Date.now() - 11_000, - }); +test('runner session keeps readiness preflight for ready tap commands without prior command state', async () => { + const session = makeRunnerSession({ ready: true }); mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ tapped: true })); @@ -407,7 +445,7 @@ test('runner session keeps readiness preflight for tap commands when marked read }); test('runner session keeps readiness preflight for non-tap mutating commands when marked ready', async () => { - const session = makeRunnerSession({ ready: true, lastSuccessfulRunnerResponseAtMs: Date.now() }); + const session = makeRunnerSession({ ready: true }); mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); mockSendRunnerCommandOnce.mockResolvedValueOnce(runnerResponse({ pressed: true })); @@ -453,6 +491,73 @@ test('runner session preserves structured runner failures', async () => { ); }); +test('runner session invalidates after runner-fatal ok payloads', async () => { + const device = { ...IOS_SIMULATOR, id: 'runner-session-fatal-payload-sim' }; + const session = await ensureRunnerSession(device, {}); + mockWaitForRunner.mockClear(); + mockWaitForRunner.mockResolvedValueOnce( + runnerResponse({ + message: 'iOS XCTest snapshot failed with kAXErrorIllegalArgument.', + nodes: [], + truncated: true, + runnerFatal: true, + runnerFatalReason: 'ax_snapshot_unavailable', + }), + ); + + const result = await executeRunnerCommandWithSession( + device, + session, + { command: 'snapshot', appBundleId: 'com.example.demo' }, + '/tmp/runner.log', + 30_000, + ); + + assert.equal(result.runnerFatal, true); + assert.equal(result.runnerFatalReason, 'ax_snapshot_unavailable'); + assert.equal(getRunnerSessionSnapshot(device.id), null); + assert.equal( + mockRunAppleToolCommand.mock.calls.some((call) => call[0] === 'pkill'), + true, + ); +}); + +test('runner session invalidates after XCTest recorded mutation failures', async () => { + const device = { ...IOS_SIMULATOR, id: 'runner-session-xctest-failure-sim' }; + const session = await ensureRunnerSession(device, {}); + mockWaitForRunner.mockClear(); + mockWaitForRunner.mockResolvedValueOnce(runnerResponse({ uptimeMs: 42 })); + mockSendRunnerCommandOnce.mockResolvedValueOnce( + runnerError({ + code: 'XCTEST_RECORDED_FAILURE', + message: + 'XCTest recorded a failure while executing tap; the action may not have been performed.', + }), + ); + + await assert.rejects( + () => + executeRunnerCommandWithSession( + device, + session, + { command: 'tap', x: 120, y: 240, appBundleId: 'com.example.demo' }, + '/tmp/runner.log', + 30_000, + ), + (error: unknown) => { + assert.ok(error instanceof AppError); + assert.equal(error.code, 'XCTEST_RECORDED_FAILURE'); + assert.match(error.message, /may not have been performed/); + return true; + }, + ); + assert.equal(getRunnerSessionSnapshot(device.id), null); + assert.equal( + mockRunAppleToolCommand.mock.calls.some((call) => call[0] === 'pkill'), + true, + ); +}); + test('runner session starts xcodebuild through provider seams and reuses an alive session', async () => { const device = { ...IOS_SIMULATOR, id: 'runner-session-start-sim' }; @@ -518,6 +623,25 @@ test('runner session restarts alive runner when expected xctestrun artifact chan assert.equal(mockRunCmdBackground.mock.calls.length, 2); }); +test('runner session restarts dead runner without graceful shutdown', async () => { + const device = { ...IOS_SIMULATOR, id: 'runner-session-dead-sim' }; + + const session = await ensureRunnerSession(device, {}); + mockWaitForRunner.mockClear(); + mockIsProcessAlive.mockReturnValue(false); + + const restarted = await ensureRunnerSession(device, {}); + + assert.notEqual(restarted, session); + assert.equal(mockRunCmdBackground.mock.calls.length, 2); + assert.equal(mockWaitForRunner.mock.calls.length, 0); + assert.deepEqual(mockCleanupTempFile.mock.calls, [ + ['/tmp/session-runner.xctestrun'], + ['/tmp/session-runner.json'], + ]); + assert.equal(mockRedirectRelease.mock.calls.length, 1); +}); + test('runner session keeps boot and stale bundle cleanup available when needed', async () => { const device = { ...IOS_SIMULATOR, id: 'runner-session-clean-sim', booted: false }; @@ -534,7 +658,10 @@ test('runner session keeps boot and stale bundle cleanup available when needed', true, ); const uninstallCalls = mockRunXcrun.mock.calls.filter((call) => call[0]?.includes('uninstall')); - assert.equal(uninstallCalls.every((call) => call[1]?.timeoutMs === 10_000), true); + assert.equal( + uninstallCalls.every((call) => call[1]?.timeoutMs === 10_000), + true, + ); }); test('runner session stale bundle cleanup is best-effort when simctl stalls', async () => { diff --git a/src/platforms/ios/interactions.ts b/src/platforms/ios/interactions.ts index c7277f018..bd254ef65 100644 --- a/src/platforms/ios/interactions.ts +++ b/src/platforms/ios/interactions.ts @@ -74,11 +74,7 @@ export function iosRunnerOverrides( runnerOpts, overrides: { tap: async (x, y) => { - return await runIosRunnerCommand( - device, - { command: 'tap', x, y, appBundleId: ctx.appBundleId }, - runnerOpts, - ); + return await runIosRunnerCommand(device, iosTapCommand(device, ctx, x, y), runnerOpts); }, tapElementSelector: async (selector) => { return await runIosRunnerCommand( @@ -150,11 +146,7 @@ export function iosRunnerOverrides( ); }, focus: async (x, y) => { - return await runIosRunnerCommand( - device, - { command: 'tap', x, y, appBundleId: ctx.appBundleId }, - runnerOpts, - ); + return await runIosRunnerCommand(device, iosTapCommand(device, ctx, x, y), runnerOpts); }, type: async (text, delayMs) => { await runIosRunnerCommand( @@ -258,6 +250,21 @@ export function iosRunnerOverrides( }; } +function iosTapCommand( + device: DeviceInfo, + ctx: RunnerContext, + x: number, + y: number, +): RunnerCommand { + return { + command: 'tap', + x, + y, + ...(device.platform === 'ios' && device.target !== 'tv' ? { synthesized: true } : {}), + appBundleId: ctx.appBundleId, + }; +} + function iosDragCommand( device: DeviceInfo, ctx: RunnerContext, diff --git a/src/platforms/ios/runner-command-recovery.ts b/src/platforms/ios/runner-command-recovery.ts index e93645991..486d71b11 100644 --- a/src/platforms/ios/runner-command-recovery.ts +++ b/src/platforms/ios/runner-command-recovery.ts @@ -23,12 +23,6 @@ type RunnerTransportRecoveryContext = { invalidateSession: (session: RunnerSession, reason: string) => Promise; }; -type RunnerReadinessPreflightRecoveryDetails = { - readinessPreflightSkipped?: boolean; - readinessPreflightSkipReason?: string; - readinessPreflightSkippedAgeMs?: number; -}; - const RUNNER_STATUS_RECOVERY_TIMEOUT_MS = 3_000; export async function handleRunnerTransportErrorAfterCommandSend(params: { @@ -131,7 +125,6 @@ async function tryRecoverRunnerCommandAfterTransportError( signal?: AbortSignal, ): Promise { if (command.command === 'status' || !command.commandId?.trim()) return undefined; - const readinessPreflight = readReadinessPreflightRecoveryDetails(transportError); let status: Record; try { status = await executeRunnerCommandWithSession( @@ -150,7 +143,6 @@ async function tryRecoverRunnerCommandAfterTransportError( command: command.command, commandId: command.commandId, error: error instanceof Error ? error.message : String(error), - ...readinessPreflight, }, }); return { type: 'retainInvalidation', reason: 'status_probe_failed' }; @@ -164,7 +156,6 @@ async function tryRecoverRunnerCommandAfterTransportError( command: command.command, commandId: command.commandId, lifecycleState, - ...readinessPreflight, }, }); return handleRunnerCommandStatusRecovery( @@ -249,7 +240,6 @@ function handleCompletedRunnerStatus( lifecycleState: 'completed', }; } - const readinessPreflight = readReadinessPreflightRecoveryDetails(transportError); return { type: 'skipInvalidation', reason: 'completed_without_retained_response', @@ -262,8 +252,7 @@ function handleCompletedRunnerStatus( commandId: command.commandId, lifecycleState: 'completed', recovery: 'completed_without_retained_response', - ...readinessPreflight, - hint: completedWithoutRetainedResponseHint(command.command, readinessPreflight), + hint: completedWithoutRetainedResponseHint(command.command), logPath: options.logPath, transportError: transportError.message, }, @@ -286,7 +275,6 @@ function runnerStatusFailureError( : 'Runner command failed'; const hint = typeof status.lifecycleErrorHint === 'string' ? status.lifecycleErrorHint : undefined; - const readinessPreflight = readReadinessPreflightRecoveryDetails(transportError); return new AppError( toAppErrorCode(errorCode), errorMessage, @@ -295,8 +283,7 @@ function runnerStatusFailureError( commandId: command.commandId, lifecycleState: 'failed', recovery: 'runner_reported_failure', - ...readinessPreflight, - hint: hint ?? runnerReportedFailureHint(command.command, readinessPreflight), + hint: hint ?? runnerReportedFailureHint(command.command), logPath: options.logPath, transportError: transportError.message, }, @@ -313,7 +300,6 @@ function runnerStatusInFlightError( if (isReadOnlyRunnerCommand(command.command)) { return transportError; } - const readinessPreflight = readReadinessPreflightRecoveryDetails(transportError); return new AppError( 'COMMAND_FAILED', `Runner command "${command.command}" is still ${lifecycleState} after the transport response was lost.`, @@ -322,8 +308,7 @@ function runnerStatusInFlightError( commandId: command.commandId, lifecycleState, recovery: 'command_still_in_flight', - ...readinessPreflight, - hint: inFlightAfterLostResponseHint(command.command, lifecycleState, readinessPreflight), + hint: inFlightAfterLostResponseHint(command.command, lifecycleState), logPath: options.logPath, transportError: transportError.message, }, @@ -349,33 +334,16 @@ function parseLifecycleResponsePayload(value: string): LifecycleResponsePayload return {}; } -function completedWithoutRetainedResponseHint( - command: string, - readinessPreflight: RunnerReadinessPreflightRecoveryDetails = {}, -): string { - return `${lostResponseReadinessContext(readinessPreflight)}The runner is still reachable and reports "${command}" already completed, so agent-device kept the session open and will not replay it. Run snapshot -i to inspect the current UI, then continue from that observed state.`; +function completedWithoutRetainedResponseHint(command: string): string { + return `The runner is still reachable and reports "${command}" already completed, so agent-device kept the session open and will not replay it. Run snapshot -i to inspect the current UI, then continue from that observed state.`; } -function runnerReportedFailureHint( - command: string, - readinessPreflight: RunnerReadinessPreflightRecoveryDetails = {}, -): string { - return `${lostResponseReadinessContext(readinessPreflight)}The runner is still reachable and reports "${command}" failed after the transport response was lost, so agent-device kept the session open and did not replay it. Run snapshot -i to inspect the current UI and retry with a selector visible in that snapshot.`; +function runnerReportedFailureHint(command: string): string { + return `The runner is still reachable and reports "${command}" failed after the transport response was lost, so agent-device kept the session open and did not replay it. Run snapshot -i to inspect the current UI and retry with a selector visible in that snapshot.`; } -function inFlightAfterLostResponseHint( - command: string, - lifecycleState: string, - readinessPreflight: RunnerReadinessPreflightRecoveryDetails = {}, -): string { - return `${lostResponseReadinessContext(readinessPreflight)}The runner is still reachable and reports "${command}" is ${lifecycleState}, so agent-device kept the session open and will not replay it. Wait briefly, run snapshot -i to inspect the current UI, then continue from that observed state.`; -} - -function lostResponseReadinessContext( - readinessPreflight: RunnerReadinessPreflightRecoveryDetails, -): string { - if (readinessPreflight.readinessPreflightSkipped !== true) return ''; - return 'This hot command skipped the uptime preflight because the runner had just responded; status recovery confirmed the runner still observed it. '; +function inFlightAfterLostResponseHint(command: string, lifecycleState: string): string { + return `The runner is still reachable and reports "${command}" is ${lifecycleState}, so agent-device kept the session open and will not replay it. Wait briefly, run snapshot -i to inspect the current UI, then continue from that observed state.`; } function unknownLifecycleStateHint(command: string): string { @@ -406,31 +374,3 @@ function emitRunnerInvalidationDecision(params: { }, }); } - -function readBooleanDetail(error: AppError, key: string): boolean | undefined { - const value = error.details?.[key]; - return typeof value === 'boolean' ? value : undefined; -} - -function readStringDetail(error: AppError, key: string): string | undefined { - const value = error.details?.[key]; - return typeof value === 'string' ? value : undefined; -} - -function readNumberDetail(error: AppError, key: string): number | undefined { - const value = error.details?.[key]; - return typeof value === 'number' ? value : undefined; -} - -function readReadinessPreflightRecoveryDetails( - error: AppError, -): RunnerReadinessPreflightRecoveryDetails { - const details: RunnerReadinessPreflightRecoveryDetails = {}; - const skipped = readBooleanDetail(error, 'runnerReadinessPreflightSkipped'); - if (skipped !== undefined) details.readinessPreflightSkipped = skipped; - const reason = readStringDetail(error, 'runnerReadinessPreflightSkipReason'); - if (reason !== undefined) details.readinessPreflightSkipReason = reason; - const ageMs = readNumberDetail(error, 'runnerReadinessPreflightSkippedAgeMs'); - if (ageMs !== undefined) details.readinessPreflightSkippedAgeMs = ageMs; - return details; -} diff --git a/src/platforms/ios/runner-session-types.ts b/src/platforms/ios/runner-session-types.ts index 4a268455b..05c42a8ca 100644 --- a/src/platforms/ios/runner-session-types.ts +++ b/src/platforms/ios/runner-session-types.ts @@ -14,7 +14,6 @@ export type RunnerSession = { child: ExecBackgroundResult['child']; ready: boolean; startupTimeoutMs?: number; - lastSuccessfulRunnerResponseAtMs?: number; startupTimings?: Record; startupTimingsReported?: boolean; simulatorSetRedirect?: { release: () => Promise }; diff --git a/src/platforms/ios/runner-session.ts b/src/platforms/ios/runner-session.ts index fcac661d7..68332f383 100644 --- a/src/platforms/ios/runner-session.ts +++ b/src/platforms/ios/runner-session.ts @@ -51,25 +51,18 @@ const runnerSessions = new Map(); const runnerSessionLocks = new Map>(); const RUNNER_STOP_WAIT_TIMEOUT_MS = 10_000; const RUNNER_INVALIDATE_WAIT_TIMEOUT_MS = 1_000; -const RUNNER_READY_PREFLIGHT_TIMEOUT_MS = 5_000; -const RUNNER_TAP_PREFLIGHT_SKIP_FRESHNESS_MS = 10_000; +const RUNNER_READY_PREFLIGHT_TIMEOUT_MS = 1_000; const RUNNER_SHUTDOWN_TIMEOUT_MS = 15_000; const RUNNER_STALE_BUNDLE_UNINSTALL_TIMEOUT_MS = 10_000; type RunnerReadinessPreflightDecision = | { action: 'run'; - reason: - | 'startup' - | 'conservative_command' - | 'no_successful_response' - | 'successful_response_stale'; - lastSuccessfulRunnerResponseAgeMs?: number; + reason: 'startup' | 'ready_session'; } | { action: 'skip'; - reason: 'recent_successful_response'; - lastSuccessfulRunnerResponseAgeMs: number; + reason: 'read_only_startup_command' | 'readiness_probe_command'; }; function withRunnerSessionLock(deviceId: string, task: () => Promise): Promise { @@ -200,7 +193,10 @@ async function resolveReusableRunnerSession( ): Promise { if (!isRunnerProcessAlive(existing.child.pid)) { await measureRunnerStartupStep({}, 'stop_stale_session', async () => { - await stopRunnerSessionInternal(device.id, existing); + await stopRunnerSessionInternal(device.id, existing, { + graceful: false, + waitTimeoutMs: RUNNER_INVALIDATE_WAIT_TIMEOUT_MS, + }); }); return null; } @@ -514,91 +510,149 @@ export async function executeRunnerCommandWithSession( emitRunnerStartupTimings(session, command.command); const runnerCommand = withRunnerCommandId(command); const readOnlyCommand = isReadOnlyRunnerCommand(runnerCommand.command); - if (readOnlyCommand) { - const response = await withDiagnosticTimer( - 'ios_runner_command_send', + const deadline = Deadline.fromTimeoutMs(timeoutMs); + const preflightDecision = resolveRunnerReadinessPreflightDecision(session, runnerCommand); + if (preflightDecision.action === 'run') { + await runRunnerReadinessPreflight({ + device, + session, + runnerCommand, + logPath, + deadline, + signal, + decision: preflightDecision, + }); + } else { + emitRunnerReadinessPreflightSkipped(runnerCommand, session, preflightDecision); + } + + const response = await sendRunnerCommandAfterPreflight({ + device, + session, + runnerCommand, + logPath, + deadline, + timeoutMs, + signal, + readOnlyCommand, + }); + try { + const data = await parseRunnerResponse(response, session, logPath); + const runnerFatalReason = resolveRunnerFatalReason(data); + if (runnerFatalReason) { + await invalidateRunnerSession(session, runnerFatalReason); + } + return data; + } catch (error) { + const runnerFatalReason = resolveRunnerFatalErrorReason(error); + if (runnerFatalReason) { + await invalidateRunnerSession(session, runnerFatalReason); + } + throw error; + } +} + +async function sendRunnerCommandAfterPreflight(params: { + device: DeviceInfo; + session: RunnerSession; + runnerCommand: RunnerCommand; + logPath: string | undefined; + deadline: Deadline; + timeoutMs: number; + signal: AbortSignal | undefined; + readOnlyCommand: boolean; +}): Promise { + const { device, session, runnerCommand, logPath, deadline, timeoutMs, signal, readOnlyCommand } = + params; + const remainingMs = deadline.remainingMs(); + if (remainingMs <= 0) { + throw new AppError('COMMAND_FAILED', 'Runner command deadline exceeded', { timeoutMs }); + } + const diagnosticData = readOnlyCommand + ? { + command: runnerCommand.command, + commandId: runnerCommand.commandId, + readOnly: true, + sessionReady: session.ready, + timeoutMs: remainingMs, + } + : { command: runnerCommand.command, commandId: runnerCommand.commandId }; + + return await withDiagnosticTimer( + 'ios_runner_command_send', + async () => { + if (readOnlyCommand) { + return await waitForRunner( + device, + session.port, + runnerCommand, + logPath, + remainingMs, + session, + signal, + ); + } + return await sendRunnerCommandOnce(device, session.port, runnerCommand, remainingMs, signal); + }, + diagnosticData, + ); +} + +async function runRunnerReadinessPreflight(params: { + device: DeviceInfo; + session: RunnerSession; + runnerCommand: RunnerCommand; + logPath: string | undefined; + deadline: Deadline; + signal: AbortSignal | undefined; + decision: Extract; +}): Promise { + const { device, session, runnerCommand, logPath, deadline, signal, decision } = params; + const readinessTimeoutMs = session.ready + ? Math.min(RUNNER_READY_PREFLIGHT_TIMEOUT_MS, deadline.remainingMs()) + : Math.min(readRunnerStartupTimeoutMs(session), deadline.remainingMs()); + try { + const readinessResponse = await withDiagnosticTimer( + 'ios_runner_readiness_preflight', async () => await waitForRunner( device, session.port, - runnerCommand, + withRunnerCommandId({ command: 'uptime' }), logPath, - timeoutMs, + readinessTimeoutMs, session, signal, ), { command: runnerCommand.command, commandId: runnerCommand.commandId, - readOnly: true, + reason: decision.reason, sessionReady: session.ready, - timeoutMs, + timeoutMs: readinessTimeoutMs, }, ); - return await parseRunnerResponse(response, session, logPath); + await parseRunnerResponse(readinessResponse, session, logPath); + } catch (error) { + throw markRunnerReadinessPreflightError(error); } +} - const deadline = Deadline.fromTimeoutMs(timeoutMs); - const preflightDecision = resolveRunnerReadinessPreflightDecision(session, runnerCommand); - if (preflightDecision.action === 'run') { - const readinessTimeoutMs = session.ready - ? Math.min(RUNNER_READY_PREFLIGHT_TIMEOUT_MS, deadline.remainingMs()) - : Math.min(readRunnerStartupTimeoutMs(session), deadline.remainingMs()); - try { - const readinessResponse = await withDiagnosticTimer( - 'ios_runner_readiness_preflight', - async () => - await waitForRunner( - device, - session.port, - withRunnerCommandId({ command: 'uptime' }), - logPath, - readinessTimeoutMs, - session, - signal, - ), - { - command: runnerCommand.command, - commandId: runnerCommand.commandId, - lastSuccessfulRunnerResponseAgeMs: preflightDecision.lastSuccessfulRunnerResponseAgeMs, - reason: preflightDecision.reason, - sessionReady: session.ready, - timeoutMs: readinessTimeoutMs, - }, - ); - await parseRunnerResponse(readinessResponse, session, logPath); - } catch (error) { - throw markRunnerReadinessPreflightError(error); - } - } else { - emitDiagnostic({ - level: 'debug', - phase: 'ios_runner_readiness_preflight_skipped', - data: { - command: command.command, - commandId: runnerCommand.commandId, - lastSuccessfulRunnerResponseAgeMs: preflightDecision.lastSuccessfulRunnerResponseAgeMs, - reason: preflightDecision.reason, - sessionReady: session.ready, - }, - }); - } - const remainingMs = deadline.remainingMs(); - if (remainingMs <= 0) { - throw new AppError('COMMAND_FAILED', 'Runner command deadline exceeded', { timeoutMs }); - } - const response = await withDiagnosticTimer( - 'ios_runner_command_send', - async () => - await sendRunnerCommandOnce(device, session.port, runnerCommand, remainingMs, signal), - { command: runnerCommand.command, commandId: runnerCommand.commandId }, - ).catch((error: unknown) => { - if (preflightDecision.action === 'skip') { - throw markRunnerSkippedReadinessPreflightError(error, preflightDecision); - } - throw error; +function emitRunnerReadinessPreflightSkipped( + runnerCommand: RunnerCommand, + session: RunnerSession, + decision: Extract, +): void { + emitDiagnostic({ + level: 'debug', + phase: 'ios_runner_readiness_preflight_skipped', + data: { + command: runnerCommand.command, + commandId: runnerCommand.commandId, + reason: decision.reason, + sessionReady: session.ready, + }, }); - return await parseRunnerResponse(response, session, logPath); } type RunnerResponsePayload = { @@ -609,7 +663,7 @@ type RunnerResponsePayload = { export async function parseRunnerResponse( response: Response, - session: Pick, + session: Pick, logPath?: string, ): Promise> { const text = await response.text(); @@ -640,7 +694,6 @@ export async function parseRunnerResponse( }); } session.ready = true; - session.lastSuccessfulRunnerResponseAtMs = Date.now(); if (json.data && typeof json.data === 'object' && !Array.isArray(json.data)) { const data = json.data as Record; emitRunnerResponseDiagnostics(data); @@ -664,58 +717,56 @@ function emitRunnerResponseDiagnostics(data: Record): void { }); } +function resolveRunnerFatalReason(data: Record): string | undefined { + if (data.runnerFatal !== true) return undefined; + return typeof data.runnerFatalReason === 'string' && data.runnerFatalReason.trim().length > 0 + ? data.runnerFatalReason + : 'runner_reported_fatal_response'; +} + +function resolveRunnerFatalErrorReason(error: unknown): string | undefined { + if (!(error instanceof AppError)) return undefined; + if (error.code === 'IOS_AX_SNAPSHOT_FAILED') return 'ax_snapshot_failure'; + if (error.code === 'XCTEST_RECORDED_FAILURE') return 'xctest_recorded_failure'; + return undefined; +} + function resolveRunnerReadinessPreflightDecision( session: RunnerSession, command: RunnerCommand, ): RunnerReadinessPreflightDecision { + const readOnlyCommand = isReadOnlyRunnerCommand(command.command); if (!session.ready) { + if (readOnlyCommand) { + return { + action: 'skip', + reason: 'read_only_startup_command', + }; + } return { action: 'run', reason: 'startup', }; } - if (command.command !== 'tap' && command.command !== 'tapSeries') { + if (isRunnerReadinessProbeCommand(command.command)) { return { - action: 'run', - reason: 'conservative_command', - }; - } - const lastSuccessAt = session.lastSuccessfulRunnerResponseAtMs; - if (lastSuccessAt === undefined) { - return { - action: 'run', - reason: 'no_successful_response', - }; - } - const lastSuccessfulRunnerResponseAgeMs = Date.now() - lastSuccessAt; - if (lastSuccessfulRunnerResponseAgeMs > RUNNER_TAP_PREFLIGHT_SKIP_FRESHNESS_MS) { - return { - action: 'run', - reason: 'successful_response_stale', - lastSuccessfulRunnerResponseAgeMs, + action: 'skip', + reason: 'readiness_probe_command', }; } return { - action: 'skip', - reason: 'recent_successful_response', - lastSuccessfulRunnerResponseAgeMs, + action: 'run', + reason: 'ready_session', }; } -function markRunnerReadinessPreflightError(error: unknown): AppError { - return markRunnerPreflightError(error, { - runnerReadinessPreflightFailed: true, - }); +function isRunnerReadinessProbeCommand(command: RunnerCommand['command']): boolean { + return command === 'uptime' || command === 'status'; } -function markRunnerSkippedReadinessPreflightError( - error: unknown, - decision: Extract, -): AppError { +function markRunnerReadinessPreflightError(error: unknown): AppError { return markRunnerPreflightError(error, { - runnerReadinessPreflightSkipped: true, - runnerReadinessPreflightSkipReason: decision.reason, - runnerReadinessPreflightSkippedAgeMs: decision.lastSuccessfulRunnerResponseAgeMs, + runnerReadinessPreflightFailed: true, }); } diff --git a/src/utils/__tests__/args.test.ts b/src/utils/__tests__/args.test.ts index c74da8e02..3c5a02240 100644 --- a/src/utils/__tests__/args.test.ts +++ b/src/utils/__tests__/args.test.ts @@ -1104,6 +1104,9 @@ test('usageForCommand resolves workflow help topic', () => { assert.match(help, /There is no open-url command/); assert.match(help, /direct URL open can report success while leaving the runner\/shell focused/); assert.match(help, /verify with snapshot -i after opening/); + assert.match(help, /snapshot returns a sparse\/AX-unavailable state/); + assert.match(help, /Use plain screenshot, not screenshot --overlay-refs/); + assert.match(help, /retry snapshot -i after reaching another screen/); assert.match(help, /agent-device open exp:\/\/127\.0\.0\.1:8081 --platform android/); assert.match(help, /apps lookup misses the project but shows Expo Go\/dev-client/); assert.match(help, /metro prepare --kind expo/); diff --git a/src/utils/__tests__/output.test.ts b/src/utils/__tests__/output.test.ts index d7859abf9..ecc91e657 100644 --- a/src/utils/__tests__/output.test.ts +++ b/src/utils/__tests__/output.test.ts @@ -1296,7 +1296,7 @@ test('formatSnapshotText prints snapshot warnings ahead of empty output', () => assert.match(text, /Interactive snapshot is empty after filtering 42 raw Android nodes/); }); -test('formatSnapshotText hints to use screenshot overlay refs for sparse snapshots', () => { +test('formatSnapshotText hints to use plain screenshot for sparse snapshots', () => { const text = withNoColor(() => formatSnapshotText({ nodes: [ @@ -1314,7 +1314,10 @@ test('formatSnapshotText hints to use screenshot overlay refs for sparse snapsho assert.match(text, /Snapshot: 1 node/); assert.match(text, /Hint: sparse accessibility snapshot returned 1 node/); - assert.match(text, /screenshot --overlay-refs/); + assert.match(text, /snapshot state is invalid or unavailable/i); + assert.match(text, /Use plain screenshot, not screenshot --overlay-refs/); + assert.match(text, /If screenshot shows the Home Screen or another app, run open/); + assert.match(text, /retry snapshot -i on the next screen/); }); test('formatSnapshotText suppresses sparse snapshot hint for scoped reads', () => { diff --git a/src/utils/cli-help.ts b/src/utils/cli-help.ts index 0687b2554..027b9dc69 100644 --- a/src/utils/cli-help.ts +++ b/src/utils/cli-help.ts @@ -208,6 +208,7 @@ Validation and evidence: Prefer provided testIDs/ids/selectors for verification; use visible text when no durable selector is provided. If task says snapshot, use snapshot. If it asks visual evidence, use screenshot. Icon/tappable visual proof: screenshot --overlay-refs. Flag is --overlay-refs. + If snapshot returns a sparse/AX-unavailable state, refs are not reliable. Use plain screenshot, not screenshot --overlay-refs, navigate with coordinates if needed, then retry snapshot -i after reaching another screen; the AX failure may be screen-specific. Startup/frame health/CPU/memory: perf --json or metrics. Replay maintenance: replay -u ./flow.ad. Recording: record start/stop. By default, stop burns touch overlays into the video; use record start --hide-touches for the fastest raw recording. Android adb screenrecord has a 180s platform limit, so longer Android recordings are returned as multiple MP4 chunks. For gesture-heavy iOS simulator proof videos, prefer --hide-touches because overlay timing depends on a stable runner session while gestures are executing. Tracing: trace start ./trace.log, trace stop ./trace.log. Paths are positional. Stable known flow: batch ./steps.json, not workflow batch. @@ -399,6 +400,7 @@ Overlays and busy RN UIs: Do not manually press warning/error text bodies, collapsed banner bodies, full-screen warning parents, or broad LogBox/RedBox refs. The dismiss-overlay command owns the narrow LogBox/RedBox targeting policy. Report the overlay in the final summary. Use screenshot --overlay-refs before dismissing only if visual evidence is required. If snapshot times out because the UI never becomes idle, Android accessibility may be blocked by busy or continuously changing app UI. After that timeout, use screenshot as visual truth instead of repeatedly retrying snapshots. + If iOS snapshot reports AX unavailable or returns only a sparse root, the current screen's accessibility state is invalid. Use plain screenshot as visual truth, coordinate navigation to leave the bad screen, then take a fresh snapshot -i before returning to selector/@ref commands. Android runtime permission dialogs and native alerts are handled by alert wait/accept/dismiss. If alert reports no alert, treat the visible surface as app-owned UI and use snapshot -i plus press by label/ref. React DevTools routing: diff --git a/src/utils/output.ts b/src/utils/output.ts index 1588d641a..6e0800b6e 100644 --- a/src/utils/output.ts +++ b/src/utils/output.ts @@ -637,7 +637,7 @@ function formatSparseSnapshotHint( ): string | null { if (options.scoped === true || options.depthLimited === true || nodes.length > 3) return null; const noun = nodes.length === 1 ? 'node' : 'nodes'; - return `Hint: sparse accessibility snapshot returned ${nodes.length} ${noun}. The app may expose limited accessibility metadata; run screenshot --overlay-refs for visual context.`; + return `Hint: sparse accessibility snapshot returned ${nodes.length} ${noun}; snapshot state is invalid or unavailable for this screen. Use plain screenshot, not screenshot --overlay-refs, as visual truth. If screenshot shows the Home Screen or another app, run open for this app again first. Then navigate away with coordinates if needed and retry snapshot -i on the next screen.`; } function readSnapshotWarnings(data: Record): string[] { diff --git a/test/integration/provider-scenarios/ios-world.ts b/test/integration/provider-scenarios/ios-world.ts index 06d2c81a3..f713f9477 100644 --- a/test/integration/provider-scenarios/ios-world.ts +++ b/test/integration/provider-scenarios/ios-world.ts @@ -42,7 +42,13 @@ export async function createIosSettingsWorld(): Promise { command: 'ios.runner.tap', deviceId: PROVIDER_SCENARIO_IOS_SIMULATOR.id, platform: 'ios', - request: { command: 'tap', x: 196, y: 122, appBundleId: 'com.apple.Preferences' }, + request: { + command: 'tap', + x: 196, + y: 122, + synthesized: true, + appBundleId: 'com.apple.Preferences', + }, result: { tapped: true }, }, { diff --git a/test/skillgym/suites/agent-device-smoke-suite.ts b/test/skillgym/suites/agent-device-smoke-suite.ts index d087c22f6..a7ca67e6f 100644 --- a/test/skillgym/suites/agent-device-smoke-suite.ts +++ b/test/skillgym/suites/agent-device-smoke-suite.ts @@ -1284,6 +1284,23 @@ const SKILL_GUIDANCE_CASES: Case[] = [ outputs: [plannedCommand('screenshot'), /--overlay-refs/i], forbiddenOutputs: [/snapshot --raw/i], }), + makeCase({ + id: 'ios-ax-unavailable-screenshot-coordinate-recovery', + contract: [ + 'App name: Agent Device Tester', + 'Platform: iOS simulator', + 'Current screen returned a sparse snapshot: Snapshot: 1 node (truncated)', + 'The hint says snapshot state is invalid or unavailable for this screen', + 'If screenshot shows the Home Screen or another app, the hint says to open the app again first', + 'The visible screenshot shows the next tab target centered near x=124 y=817', + 'Accessibility may work again after leaving this screen', + ], + task: 'Plan fallback commands to recover from the AX-unavailable snapshot state: capture visual truth, navigate out using the visible coordinate, then try AX again on the next screen.', + outputs: [plannedCommand('screenshot'), /(?:click|press)\s+124\s+817/i, /snapshot -i/i], + forbiddenOutputs: [/--overlay-refs/i, /@e\d+/i, /(?:find|wait|is|get)\b/i, /snapshot --raw/i], + strictFinalOutput: true, + allowOnlyLocalCliHelpCommands: true, + }), makeCase({ id: 'perf-session-metrics', contract: [