refactor: thin GraphQL client, remove chat, surface commands + issue explorer#74
Open
smithclay wants to merge 20 commits into
Open
refactor: thin GraphQL client, remove chat, surface commands + issue explorer#74smithclay wants to merge 20 commits into
smithclay wants to merge 20 commits into
Conversation
Replace the waste/quality/compliance/sync tabs with product-surface tabs that mirror the webapp navigation, grouped Control Plane (Policies, Issues, Checks) and Data Plane (Services, Log events, Edge instances). - add internal/app/statusbar/surfaces with a shared non-interactive Model - keep syncStatus wired into the statusbar lifecycle though it is no longer a drawer tab (sync dot + sync-error toasts) - remove the superseded waste/quality/compliance/policytab packages - update docs/domains/statusbar.md
Plan to drop PowerSync and make the CLI a thin GraphQL client (reads via direct GraphQL queries, writes via inline mutations). Control plane has moved off PowerSync; CLI synced schema is stale. Option A (stateless) confirmed.
Regenerate gen/schema.graphql against the running control plane (was ~2 months stale). Surfaces the breaking changes driving the PowerSync removal: - chat (conversations/messages) is gone from the control-plane GraphQL; chat is now ephemeral/in-memory - policy approve/dismiss moved to the Issue model (ignoreIssue, createLogEventPolicy) - updateService -> setServiceEnabled; workspaces query removed - issues/checks/edgeInstances now first-class GraphQL entities NOTE: generated.go is intentionally NOT regenerated yet. genqlient is all-or-nothing across operations, so the client regen is done together with the operation + consumer migration (tracked) to keep the tree green. Re-run 'task generate:client' as part of that step (control plane must be up).
Migrate queries/*.graphql to the current control-plane schema: - services: updateService -> setServiceEnabled (op names preserved) - accounts/organizations/datadog: input type renames; drop removed workspace field from org bootstrap result - delete conversation/message ops (chat ephemeral), policy approve/dismiss (moved to Issue model), and workspaces (concept removed) generated.go NOT yet regenerated: genqlient is blocked on GetDatadogAccountStatus because DatadogAccountStatus was restructured from flat metrics to a nested model (health/readiness/coverage/current/preview/effective). That read-model remap + its ~8 consumers is the next unit and lands together with the regen. Tree still builds (generated.go unchanged).
Reconcile the genqlient operations and Go consumers with the latest control-plane API: rename create inputs (Organization/Account/Datadog), remap DatadogAccountStatus to the nested readiness/coverage model, switch service enable/disable to setServiceEnabled, and drop the removed conversation/message/workspace surfaces. Tear out the upload package and its conversation/message/policy handlers now that chat is ephemeral and writes move to inline mutations. Stub policy approve/dismiss (moved to the Issue model) and synthesize a single default workspace from the account pending the workspace->account mapping. Build, vet, and tests are green.
Add genqlient operations, domain types, and services for the product surfaces that previously read from the local PowerSync projection: - GetIssueSummary: active-issue count plus server-computed priority facet - ListChecks: product-check catalog with account-scoped posture and per-domain (cost/compliance) counts - ListEdgeInstances: edge fleet with total and last-sync recency All counts come from control-plane aggregates; the CLI never sums rows locally. Wires the new services into ServiceSet with table tests.
Chat conversations and messages are no longer persisted. The in-memory core session is already the source of truth for request history, so the SQLite-backed persistence collaborators become in-memory stand-ins: - AssistantPersister / ToolLoop mint local message IDs (no DB writes) - OrphanMessageCleaner is a no-op (the message list drops cancelled rounds in the UI; there is no store to reconcile) - conversation IDs are minted locally on first message NewRuntimeDeps no longer takes a database. Chat effect closures stop importing sqlite. Tests now read history from the session instead of the database. The chat empty-state summary still reads the local projection; that read moves to GraphQL with the other status surfaces.
Move the status-bar drawer tabs off the local PowerSync projection and onto direct GraphQL reads: - Issues surface reads the active-issue summary (priority facet) - Checks surface reads the product-check catalog grouped by domain - Edge instances surface shows the real fleet with sync recency - Log events surface reads datadog status coverage - Services tab reads service status summaries and per-service log events via new ListServiceStatuses / ListServiceLogEvents queries Add a StatusService + GetAccountStatusSummary that maps the nested datadog status into the account summary the surfaces render. Tab data injection switches from SetDB(sqlite.DB) to SetServices(ServiceSet); the sync dot and workspace count still read the runtime db. The chat empty-state summary is repointed too, removing chat's last db use. Remove the redundant Policies tab (policies moved to the Issue model; Issues is the canonical review queue).
The set_service_enabled chat tool now calls the control-plane setServiceEnabled mutation directly instead of writing through the local PowerSync outbox. Remove the obsolete approve_policy tool and its domain types: policy approval moved to the issue model and is no longer a chat action. Drop the now-dead conversation-title persistence write (chat is ephemeral).
The chat agent's read surface moves off the local SQLite catalog onto control-plane GraphQL. Replace the arbitrary-SQL query tool and the policy show card (the policy model is gone) with structured action tools: - list_services, list_issues, list_checks, list_edge_instances, account_status Add a ListIssues query and Issues.List for individual active issues. The tool registry is now just GraphQL-backed action tools (no special query/show UI). Removes the query/show tool UI packages and the embedded SQL schema.
The CLI is now a thin GraphQL client with no local database or sync engine. Remove: - internal/powersync (sync engine, CGo extension + binaries, db/crud) - internal/boundary/powersync (sync client) - internal/sqlite (local store, storage service, generated surfaces) - the onboarding sync gate + sync status indicator - powersync/sqlite wiring in app, statusbar, cmd, and the internal powersync capture/sanitize debug commands Onboarding now completes at workspace selection (no 'waiting for first sync' step) and drops straight into chat. The app runtime opens a session context and scopes the GraphQL services to the account instead of opening a database and starting a syncer. Workspace selection becomes the terminal bootstrap transition; the workspace concept itself is removed in a follow-up. Tidy go.mod and the Taskfile's dead generate/replay/capture tasks.
Workspaces were removed from the control plane. Drop the concept end to end: - onboarding completes after datadog setup (the workspace-select gate is gone); EventDatadogReady / EventDatadogDiscoveryDone are terminal - bootstrap State/Completion/OnboardingComplete/PreflightState lose their Workspace fields; WorkspaceSelected event and GateWorkspaceSelect removed - delete the onboarding/workspaces step and the synthetic GraphQL WorkspaceService stub - chat, app, and the status bar key off the account; the org/workspace status segment shows the org only - remove domain.Workspace/WorkspaceID and the org-preference default_workspace_id accessors The account is now the sole post-org working context.
Onboarding hung after the Datadog check on any resumed session. The user identity was only set by the auth gate, which preflight skips when the token is already valid, so completion (which requires a user) silently no-op'd and the flow never reached chat. Capture the user id during preflight's auth check and thread it through PreflightState into bootstrap state, mirroring how the auth gate populates it on a fresh login. Verified live: onboarding now advances datadog_check -> complete -> chat.
Add traditional, scriptable commands that read the control plane over GraphQL for the current account, so product data is reachable without the chat TUI: - tero status — account health, service/event counts, cost, open issues - tero issues — active issues (priority, id, service, title) - tero checks — product checks with findings and cost - tero services — enabled services with volume and cost - tero edge — registered edge instances Drop the cost field from the issues read: the deployed control-plane Issue type does not expose it (schema-mirror drift). Verified all five commands against live prd.
The chat backend is decommissioned, so the chat-first TUI is dead. After onboarding the CLI now opens a minimal, read-only issue explorer that lists the account's active issues (priority, id, service, title) with arrow navigation and refresh, backed by the GraphQL issue reads. Remove the chat subsystem wholesale: internal/app/chat, internal/app/chattools, internal/boundary/chat, internal/core/chat, the chat client/tool-registry/runtime-deps wiring in the app runtime, the ChatEndpoint config and its WorkOS token audience. The status drawer's 'ask Tero' prompt hook is now inert. Verified live: onboarding completes into the explorer and loads issues against prd.
Add a persistent --output/-o flag (table default, json) on the root command, inherited by every subcommand. A shared emit() helper routes each command's result: --output=json writes indented JSON, otherwise the table renderer runs. Each command marshals a stable, snake_case output struct (raw numbers, omitempty for unmeasured costs) rather than internal types, so the JSON is clean and scriptable. Covers status, issues, checks, services, and edge. Verified live against prd in both formats.
- gen-check: drop the deleted `go generate ./internal/sqlite` step (the package no longer exists) - powersync-replay: remove the job, its workflow input, the gate dependency, and the nightly invocation (the replay test is gone) - lint: fix SA4000 (snapshotKey determinism check uses two vars), remove the now-unused chat boundary assertion helpers, and delete the unused ptr/deref helpers - unit: add ripgrep to hermit so the event/naming lint scripts have `rg` in CI (they silently found nothing without it) task lint is clean (0 issues) and the full suite passes.
These failures are unrelated to the PowerSync removal; they are dependency/toolchain hygiene that fails any current PR. - Workflow Lint: re-pin reviewdog/action-actionlint to a resolvable v1 SHA (the old pinned commit no longer exists upstream). - Security/govulncheck: bump the go directive to 1.25.11 so the security job (setup-go from go.mod) builds against a patched standard library — the listed CVEs (net/textproto, crypto/x509, net, net/http, crypto/tls) are fixed in the 1.25.9-1.25.11 patches; the 1.26 line is not patched yet. Hermit jobs keep using go 1.26.0 (>= 1.25.11). - Security/govulncheck + OSV: bump golang.org/x/net to v0.56.0 and golang.org/x/sys to v0.46.0 to clear GO-2026-4918 and GO-2026-5024-5030. govulncheck now reports only standard-library findings (cleared in 1.25.11), osv-scanner exits clean, and build/vet/test are green.
The README described the old chat-first interface, which no longer exists. Rewrite it to reflect what the CLI actually does today — connect Datadog and read your account's issues, checks, services, and status — and drop the removed chat 'block waste / edit code' flows. Organize by Diátaxis: Getting started (tutorial), How-to guides, Reference (commands, flags, UI keys, env vars, files), and Concepts. Command/flag reference verified against the built binary.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates to the CLI to reflect new APIs.
What changed
DatadogAccountStatus,setServiceEnabled, removed conversation/message/workspace surfaces).StatusService; re-pointed the status-bar surfaces off the local projection. Removed the redundant Policies tab.set_service_enabledis now an inlinesetServiceEnabledmutation; the upload outbox is deleted.core/chat, and theChatEndpointconfig/audience.internal/powersync,internal/boundary/powersync,internal/sqlite(incl. the CGo extension binaries), the sync gate and sync-status indicator, and all related wiring.tero status | issues | checks | services | edge, each with a persistent-o/--output table|jsonflag.Net diff: roughly +13k / −28k across ~280 files (mostly deletions).
Verified live (prd)
teroonboarding resumes and lands in the issue explorer; the explorer loads real issues.tero status / issues / checks / services / edgereturn real account data;-o jsonworks.go build,go vet, and the full test suite are green.Includes a fix for a real onboarding hang: the user identity is now resolved during preflight, so completion succeeds on a resumed session (the old flow set the user only via the skipped auth gate).
Known gaps / risks (please read before merging)
tero datadog connectcommand yet.Test plan
go build ./... && go vet ./... && go test ./...teroresumes onboarding → explorer (live, prd)tero status|issues|checks|services|edge+-o json(live, prd)