Skip to content

feat(cli): improve agent discoverability and add headless auth login#291

Draft
rafa-thayto wants to merge 14 commits into
mainfrom
empty-legume
Draft

feat(cli): improve agent discoverability and add headless auth login#291
rafa-thayto wants to merge 14 commits into
mainfrom
empty-legume

Conversation

@rafa-thayto
Copy link
Copy Markdown
Contributor

Summary

Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a clerk auth login --token <key> flow for CI / agents.

  • Top-level help discoverability: clerk --help now renders an Examples: block and an Environment: section listing the five CLERK_* env vars the binary actually reads (CLERK_SECRET_KEY, CLERK_MODE, CLERK_CONFIG_DIR, CLERK_UPDATE_CHANNEL, CLERK_NO_UPDATE_CHECK). Implemented via a new setEnvVars() declaration-merging helper in lib/help.ts, mirroring the existing setExamples() pattern.
  • --json field documentation: apps list|create, users list|create, and doctor --json now describe the JSON shape in the option description so agents/consumers know what to expect.
  • Headless authentication (clerk auth login --token <key>): accepts a Clerk PLAPI access token inline or via - (stdin). Validates JWT shape, size cap (8 KB), and azp audience claim locally before the userinfo network call, then persists with no refresh token. A --token - invocation on a TTY refuses up-front instead of hanging on EOF. Sibling awaitConcurrentRefresh skips the race-detection loop for token-only sessions so two parallel logins don't collide on the empty-refresh sentinel.

Background: vault handoff doc — diffs the Clerk CLI's agentcli-bench score (44.5) against resend (55.7) and prescribes the five gap closures landed here. Expected score after this PR is ~52–55 overall.

A property test guards the Environment: list against drift — every documented CLERK_* name must actually be read in cli-core/src/.

Test plan

  • bun run format / bun run lint / bun run typecheck / bun run test pass locally (CI will re-run)
  • clerk --help renders the new Examples: and Environment: sections
  • clerk auth login --help shows --token <key> and references CLERK_SECRET_KEY for per-instance API access
  • clerk auth login --token <jwt> with a fresh valid token logs in without OAuth and persists the session
  • clerk auth login --token sk_test_xxx rejects with a clear "expected a JWT" message before hitting the network
  • clerk auth login --token - refuses on a TTY and reads from stdin when piped
  • cat token.txt | clerk auth login --token - works end-to-end
  • clerk users list --json and clerk apps list --json still emit pipeable JSON
  • Re-run agentcli-bench against the freshly-compiled binary and confirm D3, A4, P3, P2, T7 improve as expected

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 72b3544

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
clerk Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rafa-thayto rafa-thayto reopened this May 19, 2026
Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a
`clerk auth login --token <key>` flow for CI / agents:

- Top-level `Examples:` block on `clerk --help` (D3)
- New `Environment:` help section via `setEnvVars()`, documenting the
  five `CLERK_*` env vars the binary actually reads (A4)
- `--json` field descriptions on `apps list|create`, `users list|create`,
  and `doctor --json` so consumers know the shape (P3)
- Verified `--json` + `isAgent()` coverage across data-returning
  subcommands (P2)
- `clerk auth login --token <key>` for headless auth: accepts a Clerk
  PLAPI access token (or `-` for stdin), validates JWT shape and
  audience (`azp` claim, soft check with back-compat) locally before
  the userinfo call, persists with no refresh token. Sibling
  `awaitConcurrentRefresh` skips the race-detection loop for token-only
  sessions so two parallel logins don't collide on the empty-refresh
  sentinel (T7)

A property test guards the `Environment:` list against drift — every
documented `CLERK_*` name must be one the CLI actually reads.
login.ts now imports storeAccessToken, assertValidAccessToken, and
getJwtAuthorizedParty from credential-store.ts. The shared test stubs
were missing these exports, causing login.test.ts to fail with
"Export named 'storeAccessToken' not found" when Bun resolved the
mocked module.
D8: agentcli-bench rubric. Surface canonical docs links so agents
discovering the CLI from --help can find usage references.
P9: agentcli-bench rubric. Pair --quiet with existing --verbose so
agents can pin log verbosity in either direction. Sets log level to
'error' which keeps fatal output but silences info/warn/success.
P8: agentcli-bench rubric. Color was emitted unconditionally; now
gated on stdout TTY detection, the NO_COLOR env var, and the new
--no-color global flag. Inline highlight() and tag-prefix codes in
log.ts honor the same gate. log.test.ts explicitly forces color on
since its assertions inspect ANSI sequences.
R5: agentcli-bench rubric. Agents probing state need a structured
whoami; the existing text output was email-only and 'not_logged_in'
came back as a thrown error. The --json branch returns
{authenticated, user, linked, app, appName} unconditionally so
unauthenticated state is a value, not an error code.
P5: agentcli-bench rubric. Bumps EXIT_CODE.USAGE from 2 to 64 and
adds DATAERR(65), UNAVAILABLE(69), SOFTWARE(70), TEMPFAIL(75),
NOPERM(77) for use by retryable/transient error classification.

Wires program.exitOverride() so Commander's unknownOption /
unknownCommand / missingArgument errors funnel through runProgram
and exit with EX_USAGE instead of Commander's default 1. Agents can
now branch on exit code alone:
  64  bad invocation        — fix the command
  75  transient/network     — retry
  77  auth                  — re-authenticate

Tests that use EXIT_CODE.USAGE symbolically are unaffected by the
numeric bump.
R3 + R7: agentcli-bench rubric. Every outputJsonError() now emits
{code, message, retryable, nextStep, docsUrl?, errors?}.

  retryable: HTTP 408/425/429/5xx, plus network ECONNREFUSED/RESET/
             ETIMEDOUT/EAI_AGAIN/'fetch failed', are flagged true so
             agents can implement a single retry loop.
  nextStep:  per-class remedy ('retry with backoff', 'check
             connectivity with clerk doctor', 'run clerk --help').
  exitCode:  4xx auth → EX_NOPERM (77); 5xx → EX_UNAVAILABLE (69);
             429/408/425 → EX_TEMPFAIL (75); other → 1/SOFTWARE.

Combined R3+R7 because both extend the same JSON shape — splitting
would have made the second commit a single-field add.
D4: agentcli-bench rubric. Agents that don't want to parse --help can
walk the JSON shape produced by 'clerk schema' to discover every
subcommand, argument, and option (with choices, defaults, flags).

Returns {cli, version, schemaVersion, command} where command is a
recursive SchemaCommand node. schemaVersion=1 is the stable contract;
breaking shape changes bump it.
P10: agentcli-bench rubric. JSON shape becomes
{data, hasMore, nextCursor, pagination: {offset, limit}}. nextCursor
encodes the next offset so agents can paginate forward without
knowing the scheme — pass it back as --offset. Existing hasMore is
retained as the canonical 'done?' signal.
R8: agentcli-bench rubric. apps create is non-idempotent by default
— re-running creates duplicates. --if-not-exists looks up an app by
name first and returns it (with reused:true in --json output)
instead of creating a duplicate. The default behavior is preserved;
agents that need idempotency opt in explicitly.
R6: agentcli-bench rubric. Onboarding subcommands already print
next-step hints (init/link/auth); top-level --help didn't. Adds a
Next: block listing 'auth login', 'init', 'doctor' as the canonical
first commands for new users (or agents discovering the CLI cold).
Commander's recursive parent chain has concrete generic parameters
that don't unify across heterogeneous subcommands, so importing
Command<Args, Opts, GlobalOpts> for typing the walker fails strict
typecheck. Replace with a CommandLike interface that captures only
the introspection surface we need (name/aliases/description/
registeredArguments/options/commands).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant