Skip to content

Guard spawn agent activity labels#454

Open
itkonen wants to merge 3 commits into
editor-code-assistant:masterfrom
itkonen:fix/spawn-agent-activity-label
Open

Guard spawn agent activity labels#454
itkonen wants to merge 3 commits into
editor-code-assistant:masterfrom
itkonen:fix/spawn-agent-activity-label

Conversation

@itkonen
Copy link
Copy Markdown
Contributor

@itkonen itkonen commented May 8, 2026

Sometimes Codex provides sub-agent activity labels that are extremely long, nonsensical, or repeat the same text over and over. This bloats the dialogue and can also pollute the tool-call history that later model turns may see. This PR adds a small guardrail to prevent that issue.


The summary below was generated by AI.

  • I added a entry in changelog under unreleased section.
  • This is not an AI slop.

Summary

  • Normalize spawn_agent activity arguments by trimming, collapsing whitespace, truncating overly long labels, and omitting blank or non-string labels.
  • Apply normalization before pre-tool-call hooks and again after hook-modified arguments so stored/replayed tool-call arguments stay clean.
  • Format subagent summaries without a trailing activity suffix when the activity is omitted.

Verification

  • clojure -M:test --focus eca.features.tools.agent-test
  • clj-kondo --lint src/eca/features/tools/agent.clj src/eca/features/chat/tool_calls.clj test/eca/features/tools/agent_test.clj

@ericdallo
Copy link
Copy Markdown
Member

@itkonen Can you give me a example from eca stderr or something to understand better how codex behave?

@itkonen
Copy link
Copy Markdown
Contributor Author

itkonen commented May 9, 2026

@ericdallo There's actually nothing in the stderr.

The main issue is with the user-facing labels printed out in the chat. Sometimes the label is just gibberish or exotic unicode symbols, but sometimes it's really, really long. Like here in this subagent call the label was over 13000 characters, filling multiple pages on my screen:

reviewer: reviewing label guardrail fixes upstream quality and downstream model context contamination risks with very long label handling and schema side effects repeated repeated repeated repeated repeated ... [about 13000 characters in this label!] ... repeated repeated repeated (6 steps) ✅ 4m 30s

A secondary issue is that this long label might get passed forward in later chat turns, filling up the LLM context window.

My guess is that Codex uses a small specialized model to generate the tool calls that ECA receives, and that model knows only how to write the tool call payload but easily fails at writing sensible action labels. And that model might get stuck in a loop repeating a word or a phrase over and over again.

So in this PR implements minimal guardrails to limit the length of the label and prevents it from bloating further model requests. The user will still see the gibberish but at least it will not harm the workflow.

itkonen and others added 3 commits May 11, 2026 15:31
Normalize model-generated spawn_agent activity labels before they are displayed, stored, or replayed so pathological labels do not leak into UI or downstream context.

🤖 Generated with [eca](https://eca.dev)

Co-Authored-By: eca-agent <git@eca.dev>
🤖 Generated with [eca](https://eca.dev)

Co-Authored-By: eca-agent <git@eca.dev>
The PR check failed on macOS while the spawn handler had already reached completion output. Keep the regression coverage but allow more time for the end-to-end chat/prompt path and include state details if it times out.

🤖 Generated with [eca](https://eca.dev)

Co-Authored-By: eca-agent <git@eca.dev>
@itkonen itkonen force-pushed the fix/spawn-agent-activity-label branch from 7649d1d to 0c85a4a Compare May 11, 2026 12:36
@itkonen itkonen marked this pull request as ready for review May 11, 2026 12:45
@ericdallo
Copy link
Copy Markdown
Member

Ah got it, yeah that's crazy! but makes sense

@ericdallo
Copy link
Copy Markdown
Member

@itkonen but I'm not sure this is the best way to solve this, IIUC the problem is that some summary texts from LLM could be huge, affectin tool call labels, instead of normalizing that, I think we could just limit the summary text here, WDYT?

@itkonen
Copy link
Copy Markdown
Contributor Author

itkonen commented May 11, 2026

@ericdallo I think that would only fix the UI problem.

My understanding is that the misbehaving activity label would still get stored in chat history, which could later be replayed back to the Responses API as part of the tool_call arguments, filling up the model context. But I cannot read the code well enough to say for sure.

@ericdallo
Copy link
Copy Markdown
Member

Hum, yes, I thought would be ok for UI, but since this goes to LLM again and could be huge for some reason, I think it's ok to strip it

@zikajk
Copy link
Copy Markdown
Member

zikajk commented May 11, 2026

I absolutely agree that this is a problem.
But I'm wondering if it wouldn't be better to improve the parameter descriptions and reject the tool call if they aren't followed. I haven't experimented with it yet, but I'm a bit worried we'd just end up trimming the nonsense that Codex sends our way. WDYT?

@itkonen
Copy link
Copy Markdown
Contributor Author

itkonen commented May 11, 2026

@zikajk That's a valid point, but the weird thing is that the actual arguments in these tool calls might be completely sane, even if the activity label is gibberish. I wouldn't want break the workflow because of the silly labels. But it might be worth thinking if something could be done to prevent the gibberish - like giving more context for the activity label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants