You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of the Forge backlog. Effort: S (2–3 engineer-days). Risk: low (additive config; behavior change opt-in via config). Depends on: nothing. Follow-up to FWS-4 (#88) — surfaced during manual testing of cancellation scenarios.
Scope
Three orthogonal fixes to the per-IP A2A rate limiter built in issue #31:
Make RateLimitConfig configurable via forge.yaml + CLI flags. Today it's a struct in forge-cli/server/a2a_server.go with no surface — operators can't override without forking.
Raise the default WriteBurst so that bursty orchestrator dispatch (5 parallel tasks, a cron firing several jobs at once) doesn't immediately throttle.
Exempt tasks/cancel from the write bucket (or give it its own permissive bucket). Cancellation is the most rate-limit-sensitive surface in the whole protocol — throttling it amplifies the problem it's trying to solve.
The current code (per defaultRateLimitConfig in forge-cli/server/a2a_server.go:413):
These defaults match the design intent from #31 (60 req/min reads, 10 req/min writes) and the read side is fine. The write side defaults are too aggressive once you consider orchestrated workloads — see "Why this matters" below.
Why this matters
1. Parallel workflows
A platform orchestrator firing N parallel agent calls (initializ WS-3) blows past WriteBurst=3 after the third dispatch. Subsequent calls wait 1/WriteRPS = 6 seconds each. A 10-step parallel stage becomes a 60-second serialized stage.
2. Cron bursts
A forge.yaml schedule with multiple cron entries that share a firing minute will see the 4th+ task throttled. The agent silently drops tasks the operator scheduled.
3. Cost-ceiling cancel bursts (the FWS-4 case)
When a workflow's cost ceiling is hit, the orchestrator wants to fire tasks/cancel against every in-flight agent in the workflow — possibly dozens. With tasks/cancel sharing the same bucket as tasks/send, the cancels are throttled at exactly the moment cancellation matters most. The FWS-4 manual test surfaced this: after running 4 cancellation scenarios in ~10 seconds, the 5th got -32603: rate limit exceeded from the middleware before reaching the cancel handler.
This is the most concerning of the three because cancellation is the recovery mechanism. Throttling the recovery mechanism turns a recoverable cost overrun into an extended one.
4. Per-IP grouping breaks behind a service IP
In k8s, multiple orchestrator pods sit behind a single service IP (or all hit the agent through one ingress IP). With per-IP rate limiting, the entire orchestrator fleet shares one bucket. The agent's effective dispatch capacity is 10 req/min total, regardless of how many orchestrator replicas are running.
Env: FORGE_RATE_LIMIT_WRITE_RPS, FORGE_RATE_LIMIT_WRITE_BURST, etc.
server.rate_limit in forge.yaml
Defaults
2. Bumped defaults
Field
Old default
New default
Reason
ReadRPS
1.0 (60/min)
1.0 (60/min)
unchanged — fine for status polling
ReadBurst
10
10
unchanged
WriteRPS
10/60 (10/min)
1.0 (60/min)
parallel workflow dispatch needs 1/sec sustained
WriteBurst
3
20
absorbs orchestrator dispatch bursts + cron-fire bursts without silent drops
CancelExempt
(n/a — bug)
true
see deliverable 3
These still protect against unauthenticated-client DoS (60/min is one task per second sustained) while not breaking normal orchestrated use. Operators can lock down further via config if their threat model is stricter.
3. Cancel exemption
The rate-limiter middleware classifies methods as read vs. write today. Add a third class: cancel. tasks/cancel goes through a separate (much more permissive) bucket — or skips the limiter entirely.
Rationale: cancel is "stop doing work." It's idempotent (FWS-4 made it so), it's cheap (no LLM dispatch, just signal a registered cancel func), and it's most needed exactly when something is going wrong. Sharing the budget with tasks/send turns cost-ceiling enforcement into a serialized 6-second-per-cancel scan, which is a footgun.
Implement as either:
(a) cancel_exempt: true skips the limiter for tasks/cancel entirely (simplest, recommended).
(b) Separate cancel_rps / cancel_burst bucket (more knobs, less footgun-protection).
(a) is simpler and matches the threat model: cancel is internally rate-limited by the registry (an unknown task ID returns instantly without doing work), so DoS via cancel-spam is bounded by the cost of looking up a map entry — not a real concern.
Architectural notes
No protocol change. The A2A wire format is unchanged; this is purely server-side resource control.
The 429 response shape stays the same. Existing clients that retry on 429 keep working.
Resolve RateLimitConfig from CLI > env > yaml > defaults; pass into ServerConfig.RateLimit
forge-cli/cmd/run.go
Add --rate-limit-* flags
docs/reference/forge-yaml-schema.md
Document the new server.rate_limit block
CHANGELOG.md
Changed entry for the new defaults + Added for the config surface
Tests
forge-cli/server/a2a_server_test.go — extend existing TestRateLimitMiddleware_* tests to cover:
CancelExempt=true → tasks/cancel always passes regardless of write bucket state
Config-from-forge.yaml round-trip
CLI flag overrides yaml
New: TestRateLimitMiddleware_NewDefaults_AllowsBurst20 to lock in the new defaults so an accidental future change doesn't silently re-tighten.
Acceptance criteria
An agent started with no config matches today's read behavior but new write defaults (60/min, burst 20, cancel exempt).
forge.yamlserver.rate_limit block round-trips through restart.
Running the FWS-4 manual test (/tmp/forge-fws4/test.sh from issue FWS-4 — Cancellation signal handling (graceful + hard-cancel + audit flush) #88's verification) without the 20-second warm-up sleep and 13-second inter-iteration sleep — all 4 scenarios still PASS. This is the operational regression check: the test should not need to dodge the rate limiter.
tasks/cancel fires successfully even after the write bucket is fully depleted by tasks/send.
Documentation explains the trade-off and shows the most common stricter-than-default config (e.g. for a public-facing agent on the open internet).
Anti-patterns to avoid
Removing the rate limiter entirely. It exists for a reason — anonymous public-facing agents need DoS protection.
Auth-aware buckets in this issue. Out of scope; separate redesign.
Making tasks/cancel totally unlimited without any internal protection. The registry lookup is the natural rate limit (O(1) map access on unknown ID).
Bumping defaults so high that they offer no protection (e.g. 1000/sec). 60/min is a reasonable balance.
Background
This was surfaced during manual testing of issue #88 (FWS-4, cancellation signal handling). The FWS-4 test script ran 4 scenarios of tasks/send + tasks/cancel pairs and hit -32603: rate limit exceeded from the limiter middleware on the cancel side — exactly the case where rate-limiting cancellation amplifies the problem cancellation is trying to solve.
Original rate limiter design: issue #31 (defaults at the time: 60 req/min reads, 10 req/min writes). Those defaults predate parallel workflow execution (WS-3) and cost-ceiling cancellation (FWS-4); they're now out of step with how Forge is operationally driven.
Scope
Three orthogonal fixes to the per-IP A2A rate limiter built in issue #31:
RateLimitConfigconfigurable viaforge.yaml+ CLI flags. Today it's a struct inforge-cli/server/a2a_server.gowith no surface — operators can't override without forking.WriteBurstso that bursty orchestrator dispatch (5 parallel tasks, a cron firing several jobs at once) doesn't immediately throttle.tasks/cancelfrom the write bucket (or give it its own permissive bucket). Cancellation is the most rate-limit-sensitive surface in the whole protocol — throttling it amplifies the problem it's trying to solve.The current code (per
defaultRateLimitConfiginforge-cli/server/a2a_server.go:413):These defaults match the design intent from #31 (60 req/min reads, 10 req/min writes) and the read side is fine. The write side defaults are too aggressive once you consider orchestrated workloads — see "Why this matters" below.
Why this matters
1. Parallel workflows
A platform orchestrator firing N parallel agent calls (initializ WS-3) blows past
WriteBurst=3after the third dispatch. Subsequent calls wait1/WriteRPS = 6 secondseach. A 10-step parallel stage becomes a 60-second serialized stage.2. Cron bursts
A
forge.yamlschedule with multiple cron entries that share a firing minute will see the 4th+ task throttled. The agent silently drops tasks the operator scheduled.3. Cost-ceiling cancel bursts (the FWS-4 case)
When a workflow's cost ceiling is hit, the orchestrator wants to fire
tasks/cancelagainst every in-flight agent in the workflow — possibly dozens. Withtasks/cancelsharing the same bucket astasks/send, the cancels are throttled at exactly the moment cancellation matters most. The FWS-4 manual test surfaced this: after running 4 cancellation scenarios in ~10 seconds, the 5th got-32603: rate limit exceededfrom the middleware before reaching the cancel handler.This is the most concerning of the three because cancellation is the recovery mechanism. Throttling the recovery mechanism turns a recoverable cost overrun into an extended one.
4. Per-IP grouping breaks behind a service IP
In k8s, multiple orchestrator pods sit behind a single service IP (or all hit the agent through one ingress IP). With per-IP rate limiting, the entire orchestrator fleet shares one bucket. The agent's effective dispatch capacity is
10 req/mintotal, regardless of how many orchestrator replicas are running.Deliverables
1. Configurability via
forge.yamlNew top-level block (alongside
cors_origins):Resolution order (mirror
cors_origins):--rate-limit-write-rps,--rate-limit-write-burst,--rate-limit-read-rps,--rate-limit-read-burst,--rate-limit-cancel-exemptFORGE_RATE_LIMIT_WRITE_RPS,FORGE_RATE_LIMIT_WRITE_BURST, etc.server.rate_limitinforge.yaml2. Bumped defaults
ReadRPSReadBurstWriteRPSWriteBurstCancelExemptThese still protect against unauthenticated-client DoS (60/min is one task per second sustained) while not breaking normal orchestrated use. Operators can lock down further via config if their threat model is stricter.
3. Cancel exemption
The rate-limiter middleware classifies methods as read vs. write today. Add a third class: cancel.
tasks/cancelgoes through a separate (much more permissive) bucket — or skips the limiter entirely.Rationale: cancel is "stop doing work." It's idempotent (FWS-4 made it so), it's cheap (no LLM dispatch, just signal a registered cancel func), and it's most needed exactly when something is going wrong. Sharing the budget with
tasks/sendturns cost-ceiling enforcement into a serialized 6-second-per-cancel scan, which is a footgun.Implement as either:
cancel_exempt: trueskips the limiter fortasks/cancelentirely (simplest, recommended).cancel_rps/cancel_burstbucket (more knobs, less footgun-protection).(a) is simpler and matches the threat model: cancel is internally rate-limited by the registry (an unknown task ID returns instantly without doing work), so DoS via cancel-spam is bounded by the cost of looking up a map entry — not a real concern.
Architectural notes
429response shape stays the same. Existing clients that retry on 429 keep working.auth.user_idor similar), which is a larger redesign that should reference issue [Bug]: Phase 2 — High Priority: Rate Limiting, Request Size, Webhook Auth, Event Dedup, Trust & Symlinks (H-1 through H-10) #31's original threat model. File separately if/when the per-IP behavior becomes a practical problem.Out of scope
Files expected to change
forge-cli/server/a2a_server.godefaultRateLimitConfig; addCancelExemptfield; threadcancel-method classification through the limiter middlewareforge-core/types/forge_yaml.go(or whereverForgeConfiglives)Server.RateLimitblock to the YAML schemaforge-cli/runtime/runner.goRateLimitConfigfrom CLI > env > yaml > defaults; pass intoServerConfig.RateLimitforge-cli/cmd/run.go--rate-limit-*flagsdocs/reference/forge-yaml-schema.mdserver.rate_limitblockCHANGELOG.mdChangedentry for the new defaults +Addedfor the config surfaceTests
forge-cli/server/a2a_server_test.go— extend existingTestRateLimitMiddleware_*tests to cover:CancelExempt=true→tasks/cancelalways passes regardless of write bucket stateTestRateLimitMiddleware_NewDefaults_AllowsBurst20to lock in the new defaults so an accidental future change doesn't silently re-tighten.Acceptance criteria
forge.yamlserver.rate_limitblock round-trips through restart./tmp/forge-fws4/test.shfrom issue FWS-4 — Cancellation signal handling (graceful + hard-cancel + audit flush) #88's verification) without the 20-second warm-up sleep and 13-second inter-iteration sleep — all 4 scenarios still PASS. This is the operational regression check: the test should not need to dodge the rate limiter.tasks/cancelfires successfully even after the write bucket is fully depleted bytasks/send.Anti-patterns to avoid
tasks/canceltotally unlimited without any internal protection. The registry lookup is the natural rate limit (O(1) map access on unknown ID).Background
This was surfaced during manual testing of issue #88 (FWS-4, cancellation signal handling). The FWS-4 test script ran 4 scenarios of
tasks/send+tasks/cancelpairs and hit-32603: rate limit exceededfrom the limiter middleware on the cancel side — exactly the case where rate-limiting cancellation amplifies the problem cancellation is trying to solve.Original rate limiter design: issue #31 (defaults at the time: 60 req/min reads, 10 req/min writes). Those defaults predate parallel workflow execution (WS-3) and cost-ceiling cancellation (FWS-4); they're now out of step with how Forge is operationally driven.