feat(runtime): token usage + execution duration emission (closes #87, FWS-3)#99
Open
initializ-mk wants to merge 1 commit into
Open
feat(runtime): token usage + execution duration emission (closes #87, FWS-3)#99initializ-mk wants to merge 1 commit into
initializ-mk wants to merge 1 commit into
Conversation
… FWS-3) Every llm_call audit event now carries OTel-aligned token counts (input_tokens / output_tokens), model, provider, duration_ms, and a provider-specific request_id captured at the LLM call site for the four supported providers (Anthropic, OpenAI, Ollama via the OpenAI- compatible path, OpenAI Responses). When a provider returns no usage metadata (some self-hosted Ollama setups), the emitter flags tokens_unavailable=true rather than emit silent zeros — billing consumers can distinguish "not measured" from "zero tokens used." Each tool_exec event gains duration_ms plus structured arg-shape metadata (args_size, result_size). Raw arg values are not emitted — that's FWS-8's payload-stripping concern, not FWS-3's. A new invocation_complete audit event closes every A2A invocation with the wall-clock duration and aggregated input_tokens_total / output_tokens_total / llm_call_count. A2A REST responses carry the same per-invocation totals inline as X-Forge-Tokens-In / X-Forge-Tokens-Out / X-Forge-Duration-Ms / X-Forge-Model / X-Forge-Provider headers so an orchestrator can ceiling-check cost during parallel workflow execution without subscribing to the audit stream. Headers populate regardless of whether OTel tracing is enabled — they're the orchestration channel, not the observability channel. Cost calculation is deliberately not in Forge. Forge emits token counts; the platform applies price tables to compute dollar amounts. Price tables change frequently and shouldn't require agent redeploys. Schema additivity: all new fields use *int / *int64 pointers + the omitempty JSON tag, so pre-FWS-3 audit consumers parsing without these fields see byte-identical shape for session_start / session_end / etc. Internal API rename: llm.UsageInfo field names PromptTokens → InputTokens and CompletionTokens → OutputTokens (JSON tags too) align with the OTel GenAI semconv. The type is internal to forge-core/llm and not consumed outside that package. Bonus simplification: JSON-RPC tasks/send now delegates to executeTask (~120 lines of duplicated audit/guardrail logic removed), so both JSON-RPC and REST paths share the same usage-accumulator wiring. See docs/security/audit-logging.md#token-usage-and-execution-duration for the full event shape and header contract.
6ef1701 to
f23d770
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
input_tokens/output_tokens— OTel-aligned naming),model,provider,duration_ms, andrequest_idon everyllm_callaudit event. Captured directly from provider response metadata across all four providers (Anthropic, OpenAI, Ollama via OpenAI-compatible, OpenAI Responses).X-Forge-Tokens-In,X-Forge-Tokens-Out,X-Forge-Duration-Ms,X-Forge-Model,X-Forge-Provider) so orchestrators can enforce cost ceilings inline during parallel workflow execution without subscribing to the audit stream.invocation_completeaudit event with wall-clock duration + aggregated token totals at every A2A request boundary.tokens_unavailable=trueflag distinguishes "provider did not report usage" (some self-hosted Ollama setups) from "you used zero tokens" so downstream billing doesn't undercount.duration_msplus structured arg-shape metadata (args_size,result_size). Raw arg values are deliberately not emitted — that's FWS-8's payload-stripping concern.Pre-work inventory (per issue body)
Confirmed Architecture A before coding:
Clientinterfaceforge-core/llm/client.goChatResponsewithUsage UsageInfoalready presentforge-core/runtime/loop.go:245callse.client.Chatonllm.Clienttokens_unavailableDecision-tree Row 1 → original S (3–5 days) estimate held.
Architectural notes
AuditLogger.EmitLLMCallis the single capture point for token/duration/model/provider/request_id. The OTel tracing initiative (FORGE_OTEL_TRACING.md) can hook into the same point to populategen_ai.usage.*span attributes without re-doing per-provider extraction. Same data, captured once, fanned out to multiple emission targets with independent failure domains.input_tokens/output_tokens(matchinggen_ai.usage.input_tokens/gen_ai.usage.output_tokens). Aligned once at FWS-3, then Forge's audit schema stays Forge-owned and shouldn't churn with upstream OTel renames — consumers correlate via thetrace_id/span_idcross-link the OTel work adds later.*int/*int64+omitempty, so pre-FWS-3 audit consumers parsingsession_start/session_end/ etc. see byte-identical JSON shape.Wiring
UsageInfofield namesforge-core/llm/types.go+ 4 providersAuditEventextension +EmitLLMCall/EmitToolExec/EmitInvocationCompleteforge-core/runtime/audit.goHookContextforge-core/runtime/hooks.go+loop.goloop.go+ audit hook inrunner.goLLMUsageAccumulator(thread-safe)forge-core/runtime/usage_accumulator.go(new)invocation_completeemission +X-Forge-*headersforge-cli/runtime/runner.go+forge_usage_headers.go(new)tasks/sendsimplified to delegate toexecuteTaskforge-cli/runtime/runner.go(~120 lines deleted)Tests
forge-core/runtime/audit_llm_test.go— 6 tests: full usage,tokens_unavailableOllama path, cancelled →llm_call_cancelled, OTel naming check, backward-compat omission for non-LLM events,tool_exec+invocation_completeshapeforge-core/runtime/usage_accumulator_test.go— 8 tests including a 500-call concurrent-add race regressionforge-cli/runtime/forge_usage_headers_test.go— 3 tests: full stamping, short-circuited invocation, missing model/provider omissionforge-core/llm/providers/usage_extraction_test.go— Anthropic / OpenAI / Ollama-no-usage wire-shape testsDocs
docs/security/audit-logging.md— new event-types rows (llm_call_cancelled,invocation_complete), expandedllm_calldescription, new "Token usage and execution duration" section with field table + header table + design notesCHANGELOG.md— Unreleased entry above the FWS-1 entry, with the internalUsageInforename called outTest plan
go test -race -count=1 ./forge-core/... ./forge-cli/runtime/... ./forge-cli/server/...— all 28 packages passgolangci-lint runacross forge-core/... + forge-cli/... — 0 issuesgofmt -lcleanOut of scope (deliberately)
llm_call_cancelledemission — the event constant andEmitLLMCall(args.Cancelled)path exist, butExecuteStreamcurrently wraps non-streamingChatso the path doesn't fire today. Ready for whenever Forge adopts true client-side streaming.embedder.goalready usesUsageInfo(now OTel-aligned); per-call audit emission for embeddings is a follow-up that mirrors thellm_callpattern.Closes #87.