Fix pdb / breakpoint() hang in workflow code#1568
Conversation
| sys.breakpointhook = _temporal_workflow_breakpoint_hook | ||
|
|
||
|
|
||
| def _relax_sandbox_for_debugger(workflow_runner: WorkflowRunner) -> WorkflowRunner: |
There was a problem hiding this comment.
There's sufficient logic involved here that I think it would be a good idea to move this to a new file. _debugger.py perhaps?
| os.environ, etc.) aren't blocked either — without permanently dropping | ||
| sandbox checks for the rest of workflow execution. | ||
| """ | ||
| from temporalio.worker.workflow_sandbox._runner import SandboxedWorkflowRunner |
There was a problem hiding this comment.
Avoid inline imports unless there is a good reason. I think pdb has sufficient justification to do so, but probably not this internal import.
There was a problem hiding this comment.
Whoops, thanks! Moved to the top of the file.
When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt -- including from a sandboxed workflow run under pytest. Four pieces: - Inline dispatch on the asyncio main thread (via loop.call_soon to avoid nesting inside the dispatch task's __step() and tripping Python 3.14's task-entry validation). - breakpoint removed from the sandbox's invalid builtins so the call reaches the worker hook. Nothing else is relaxed. - A Pdb subclass that lands at the workflow's own frame, suspends sandbox checks during each REPL interaction, and overrides q/Ctrl-D to continue the workflow instead of failing it with BdbQuit. - A defensive sys.breakpointhook that raises a clear RuntimeError when breakpoint() is called from a workflow worker thread without debug_mode, replacing the previous silent hang. When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. Adds a README subsection on debugging workflows and five tests at tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14. Closes temporalio#1104.
| self._deadlock_timeout_seconds = None if debug_mode else 2 | ||
| self._deadlock_timeout_seconds = None if self._debug_mode else 2 | ||
|
|
||
| _install_workflow_breakpoint_hook() |
There was a problem hiding this comment.
This should probably only happen during debug mode as well. It may not make any actual difference, but it would be good to give that assurance that nothing is changing outside of debug mode.
There was a problem hiding this comment.
The hook is intentionally always installed. The only case it catches is breakpoint() called from workflow code without debug_mode set, which is #1104's original silent hang. Gating on debug_mode would remove the error in exactly the scenario we want to convert from silent hang to loud error.
When debug_mode is on, the dispatch fix routes the workflow to MainThread, so the hook's temporal_workflow_* check never matches and it just delegates. No observable change.
Maybe I add a code comment making the always-on rationale explicit?
What was changed
When
debug_mode=Trueon the Worker (orTEMPORAL_DEBUG=1),breakpoint()inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run underpytest. Four pieces:loop.call_soonto avoid nesting inside the dispatch task's__step()), so pdb'sinput()reaches the TTY.breakpointis removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.q/ Ctrl-D to continue the workflow instead of failing it withBdbQuit.sys.breakpointhook. Callingbreakpoint()from a workflow worker thread withoutdebug_moderaises a clearRuntimeErrorinstead of silently hanging.When
debug_modeis not set, the worker's dispatch and sandbox config are unchanged. The defensive hook replaces a silent hang with a clear error — strictly an improvement, not a change to working code.Why?
breakpoint()andpdb.set_trace()inside workflow code silently hang today. Three overlapping issues:ThreadPoolExecutorthread, so pdb'sinput()can't read the controlling TTY.breakpointas non-deterministic, so the call doesn't reach the debugger.cmdlooptouches more sandbox-restricted internals at runtime (e.g.readline.get_completer) — relaxing the builtin alone isn't enough.Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:
The dispatch task is mid-
__step()whenworkflow.activatetries to step the workflow's own task; 3.14 refuses.await futureafterloop.call_soonsuspends the dispatch task first.Complements #1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.
Checklist
Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints #1104
How was this tested:
tests/worker/test_breakpoint_hang.py— five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible,q/Ctrl-D continues cleanly, defensive hook raises. 5/5 pass on Python 3.13 and 3.14.breakpoint()into any workflow'srun()body, run viapytest -s(or a standalonepythonscript), confirm the(Pdb)prompt opens at the user's frame with locals in scope.breakpoint()/pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.