Skip to content

refactor: thin GraphQL client, remove chat, surface commands + issue explorer#74

Open
smithclay wants to merge 20 commits into
masterfrom
clay/drop-powersync
Open

refactor: thin GraphQL client, remove chat, surface commands + issue explorer#74
smithclay wants to merge 20 commits into
masterfrom
clay/drop-powersync

Conversation

@smithclay

@smithclay smithclay commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Updates to the CLI to reflect new APIs.

What changed

  • GraphQL migration — reconciled all genqlient operations with the refreshed schema (renamed create inputs, nested DatadogAccountStatus, setServiceEnabled, removed conversation/message/workspace surfaces).
  • Reads over GraphQL — added issue/check/edge/service-status reads + domain types and a StatusService; re-pointed the status-bar surfaces off the local projection. Removed the redundant Policies tab.
  • Writesset_service_enabled is now an inline setServiceEnabled mutation; the upload outbox is deleted.
  • Ephemeral chat → removed entirely — dropped message persistence first, then deleted the chat TUI, boundary, tools, core/chat, and the ChatEndpoint config/audience.
  • Deleted PowerSync + SQLiteinternal/powersync, internal/boundary/powersync, internal/sqlite (incl. the CGo extension binaries), the sync gate and sync-status indicator, and all related wiring.
  • Workspace → account — workspaces removed end to end; the account is now the sole post-org working context. Onboarding completes after Datadog setup.
  • New presentation layer (chat replacement)
    • CLI commands: tero status | issues | checks | services | edge, each with a persistent -o/--output table|json flag.
    • A minimal read-only issue explorer TUI as the default post-onboarding view.

Net diff: roughly +13k / −28k across ~280 files (mostly deletions).

Verified live (prd)

  • tero onboarding resumes and lands in the issue explorer; the explorer loads real issues.
  • tero status / issues / checks / services / edge return real account data; -o json works.
  • go build, go vet, and the full test suite are green.

Includes a fix for a real onboarding hang: the user identity is now resolved during preflight, so completion succeeds on a resumed session (the old flow set the user only via the skipped auth gate).

Known gaps / risks (please read before merging)

  • Datadog connect is interactive-only — no headless tero datadog connect command yet.
  • The explorer is intentionally minimal — read-only issue list, no detail view or actions.

Test plan

  • go build ./... && go vet ./... && go test ./...
  • tero resumes onboarding → explorer (live, prd)
  • tero status|issues|checks|services|edge + -o json (live, prd)

smithclay added 16 commits June 8, 2026 14:43
Replace the waste/quality/compliance/sync tabs with product-surface tabs
that mirror the webapp navigation, grouped Control Plane (Policies, Issues,
Checks) and Data Plane (Services, Log events, Edge instances).

- add internal/app/statusbar/surfaces with a shared non-interactive Model
- keep syncStatus wired into the statusbar lifecycle though it is no longer
  a drawer tab (sync dot + sync-error toasts)
- remove the superseded waste/quality/compliance/policytab packages
- update docs/domains/statusbar.md
Plan to drop PowerSync and make the CLI a thin GraphQL client (reads via
direct GraphQL queries, writes via inline mutations). Control plane has moved
off PowerSync; CLI synced schema is stale. Option A (stateless) confirmed.
Regenerate gen/schema.graphql against the running control plane (was ~2 months
stale). Surfaces the breaking changes driving the PowerSync removal:

- chat (conversations/messages) is gone from the control-plane GraphQL; chat is
  now ephemeral/in-memory
- policy approve/dismiss moved to the Issue model (ignoreIssue,
  createLogEventPolicy)
- updateService -> setServiceEnabled; workspaces query removed
- issues/checks/edgeInstances now first-class GraphQL entities

NOTE: generated.go is intentionally NOT regenerated yet. genqlient is
all-or-nothing across operations, so the client regen is done together with the
operation + consumer migration (tracked) to keep the tree green. Re-run
'task generate:client' as part of that step (control plane must be up).
Migrate queries/*.graphql to the current control-plane schema:
- services: updateService -> setServiceEnabled (op names preserved)
- accounts/organizations/datadog: input type renames; drop removed workspace
  field from org bootstrap result
- delete conversation/message ops (chat ephemeral), policy approve/dismiss
  (moved to Issue model), and workspaces (concept removed)

generated.go NOT yet regenerated: genqlient is blocked on GetDatadogAccountStatus
because DatadogAccountStatus was restructured from flat metrics to a nested
model (health/readiness/coverage/current/preview/effective). That read-model
remap + its ~8 consumers is the next unit and lands together with the regen.
Tree still builds (generated.go unchanged).
Reconcile the genqlient operations and Go consumers with the latest
control-plane API: rename create inputs (Organization/Account/Datadog),
remap DatadogAccountStatus to the nested readiness/coverage model, switch
service enable/disable to setServiceEnabled, and drop the removed
conversation/message/workspace surfaces.

Tear out the upload package and its conversation/message/policy handlers
now that chat is ephemeral and writes move to inline mutations. Stub
policy approve/dismiss (moved to the Issue model) and synthesize a single
default workspace from the account pending the workspace->account mapping.

Build, vet, and tests are green.
Add genqlient operations, domain types, and services for the product
surfaces that previously read from the local PowerSync projection:

- GetIssueSummary: active-issue count plus server-computed priority facet
- ListChecks: product-check catalog with account-scoped posture and
  per-domain (cost/compliance) counts
- ListEdgeInstances: edge fleet with total and last-sync recency

All counts come from control-plane aggregates; the CLI never sums rows
locally. Wires the new services into ServiceSet with table tests.
Chat conversations and messages are no longer persisted. The in-memory
core session is already the source of truth for request history, so the
SQLite-backed persistence collaborators become in-memory stand-ins:

- AssistantPersister / ToolLoop mint local message IDs (no DB writes)
- OrphanMessageCleaner is a no-op (the message list drops cancelled
  rounds in the UI; there is no store to reconcile)
- conversation IDs are minted locally on first message

NewRuntimeDeps no longer takes a database. Chat effect closures stop
importing sqlite. Tests now read history from the session instead of the
database.

The chat empty-state summary still reads the local projection; that read
moves to GraphQL with the other status surfaces.
Move the status-bar drawer tabs off the local PowerSync projection and
onto direct GraphQL reads:

- Issues surface reads the active-issue summary (priority facet)
- Checks surface reads the product-check catalog grouped by domain
- Edge instances surface shows the real fleet with sync recency
- Log events surface reads datadog status coverage
- Services tab reads service status summaries and per-service log
  events via new ListServiceStatuses / ListServiceLogEvents queries

Add a StatusService + GetAccountStatusSummary that maps the nested
datadog status into the account summary the surfaces render. Tab data
injection switches from SetDB(sqlite.DB) to SetServices(ServiceSet); the
sync dot and workspace count still read the runtime db. The chat
empty-state summary is repointed too, removing chat's last db use.

Remove the redundant Policies tab (policies moved to the Issue model;
Issues is the canonical review queue).
The set_service_enabled chat tool now calls the control-plane
setServiceEnabled mutation directly instead of writing through the local
PowerSync outbox. Remove the obsolete approve_policy tool and its domain
types: policy approval moved to the issue model and is no longer a chat
action. Drop the now-dead conversation-title persistence write (chat is
ephemeral).
The chat agent's read surface moves off the local SQLite catalog onto
control-plane GraphQL. Replace the arbitrary-SQL query tool and the
policy show card (the policy model is gone) with structured action
tools:

- list_services, list_issues, list_checks, list_edge_instances,
  account_status

Add a ListIssues query and Issues.List for individual active issues.
The tool registry is now just GraphQL-backed action tools (no special
query/show UI). Removes the query/show tool UI packages and the embedded
SQL schema.
The CLI is now a thin GraphQL client with no local database or sync
engine. Remove:

- internal/powersync (sync engine, CGo extension + binaries, db/crud)
- internal/boundary/powersync (sync client)
- internal/sqlite (local store, storage service, generated surfaces)
- the onboarding sync gate + sync status indicator
- powersync/sqlite wiring in app, statusbar, cmd, and the internal
  powersync capture/sanitize debug commands

Onboarding now completes at workspace selection (no 'waiting for first
sync' step) and drops straight into chat. The app runtime opens a session
context and scopes the GraphQL services to the account instead of opening
a database and starting a syncer.

Workspace selection becomes the terminal bootstrap transition; the
workspace concept itself is removed in a follow-up. Tidy go.mod and the
Taskfile's dead generate/replay/capture tasks.
Workspaces were removed from the control plane. Drop the concept end to
end:

- onboarding completes after datadog setup (the workspace-select gate is
  gone); EventDatadogReady / EventDatadogDiscoveryDone are terminal
- bootstrap State/Completion/OnboardingComplete/PreflightState lose their
  Workspace fields; WorkspaceSelected event and GateWorkspaceSelect removed
- delete the onboarding/workspaces step and the synthetic GraphQL
  WorkspaceService stub
- chat, app, and the status bar key off the account; the org/workspace
  status segment shows the org only
- remove domain.Workspace/WorkspaceID and the org-preference
  default_workspace_id accessors

The account is now the sole post-org working context.
Onboarding hung after the Datadog check on any resumed session. The user
identity was only set by the auth gate, which preflight skips when the
token is already valid, so completion (which requires a user) silently
no-op'd and the flow never reached chat.

Capture the user id during preflight's auth check and thread it through
PreflightState into bootstrap state, mirroring how the auth gate
populates it on a fresh login. Verified live: onboarding now advances
datadog_check -> complete -> chat.
Add traditional, scriptable commands that read the control plane over
GraphQL for the current account, so product data is reachable without
the chat TUI:

- tero status   — account health, service/event counts, cost, open issues
- tero issues   — active issues (priority, id, service, title)
- tero checks   — product checks with findings and cost
- tero services — enabled services with volume and cost
- tero edge     — registered edge instances

Drop the cost field from the issues read: the deployed control-plane
Issue type does not expose it (schema-mirror drift). Verified all five
commands against live prd.
The chat backend is decommissioned, so the chat-first TUI is dead. After
onboarding the CLI now opens a minimal, read-only issue explorer that
lists the account's active issues (priority, id, service, title) with
arrow navigation and refresh, backed by the GraphQL issue reads.

Remove the chat subsystem wholesale: internal/app/chat,
internal/app/chattools, internal/boundary/chat, internal/core/chat, the
chat client/tool-registry/runtime-deps wiring in the app runtime, the
ChatEndpoint config and its WorkOS token audience. The status drawer's
'ask Tero' prompt hook is now inert.

Verified live: onboarding completes into the explorer and loads issues
against prd.
Add a persistent --output/-o flag (table default, json) on the root
command, inherited by every subcommand. A shared emit() helper routes
each command's result: --output=json writes indented JSON, otherwise the
table renderer runs. Each command marshals a stable, snake_case output
struct (raw numbers, omitempty for unmeasured costs) rather than internal
types, so the JSON is clean and scriptable.

Covers status, issues, checks, services, and edge. Verified live against
prd in both formats.
@smithclay smithclay changed the title Drop PowerSync: thin GraphQL client, remove chat, surface commands + issue explorer refactor: thin GraphQL client, remove chat, surface commands + issue explorer Jun 12, 2026
- gen-check: drop the deleted `go generate ./internal/sqlite` step (the
  package no longer exists)
- powersync-replay: remove the job, its workflow input, the gate
  dependency, and the nightly invocation (the replay test is gone)
- lint: fix SA4000 (snapshotKey determinism check uses two vars), remove
  the now-unused chat boundary assertion helpers, and delete the unused
  ptr/deref helpers
- unit: add ripgrep to hermit so the event/naming lint scripts have `rg`
  in CI (they silently found nothing without it)

task lint is clean (0 issues) and the full suite passes.
These failures are unrelated to the PowerSync removal; they are
dependency/toolchain hygiene that fails any current PR.

- Workflow Lint: re-pin reviewdog/action-actionlint to a resolvable v1
  SHA (the old pinned commit no longer exists upstream).
- Security/govulncheck: bump the go directive to 1.25.11 so the security
  job (setup-go from go.mod) builds against a patched standard library —
  the listed CVEs (net/textproto, crypto/x509, net, net/http, crypto/tls)
  are fixed in the 1.25.9-1.25.11 patches; the 1.26 line is not patched
  yet. Hermit jobs keep using go 1.26.0 (>= 1.25.11).
- Security/govulncheck + OSV: bump golang.org/x/net to v0.56.0 and
  golang.org/x/sys to v0.46.0 to clear GO-2026-4918 and GO-2026-5024-5030.

govulncheck now reports only standard-library findings (cleared in 1.25.11),
osv-scanner exits clean, and build/vet/test are green.
The README described the old chat-first interface, which no longer exists.
Rewrite it to reflect what the CLI actually does today — connect Datadog
and read your account's issues, checks, services, and status — and drop
the removed chat 'block waste / edit code' flows.

Organize by Diátaxis: Getting started (tutorial), How-to guides,
Reference (commands, flags, UI keys, env vars, files), and Concepts.
Command/flag reference verified against the built binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant