| created | 2026-05-10 | ||
|---|---|---|---|
| last_modified | 2026-05-10 | ||
| revisions | 1 | ||
| doc_type |
|
How m-cli was built, in chronological order. This is archaeology — read the
README for the as-is, and docs/guide.md for the
comprehensive user-facing reference. This document exists so that decisions
remain auditable and so future contributors can understand why things are
shaped the way they are without having to reverse-engineer commit history.
- Origin: the four-tier strategy
- Tier 1 — closing the inner-loop gaps
- Tier 2 — quality gates and team scaling
- Cross-cutting — LSP, scaffolding, plugins
- Performance milestones
- Deferred items and known quirks
- Retirements
- Renames / namespace moves
- Engine refactor follow-ups
- Bootstrap substrate
m-cli grew out of m-tools — the
archived seed of the entire m-dev-tools organization. The driving documents
(gap-analysis-and-remediation-strategy.md,
m-tool-gap-analysis.md,
m-tooling-tier1.md)
ranked the missing developer-experience capabilities for the M (MUMPS) language
across both major engines (IRIS and YottaDB), validated against DORA /
Accelerate research, and produced four prioritised tiers:
| Tier | Theme | Capabilities |
|---|---|---|
| 1 | Inner loop | test runner · lint (logic) · format · single-test selection · watcher |
| 2 | Quality gates / team scaling | CI script · coverage · style lint · pre-commit hooks · debugger |
| 3 | Project scaffolding | new · run · build · doc · doctor |
| 4 | Library ecosystem | versioning · dependency management · package registry |
m-cli is the executor. The naming convention (m <subcommand>, mirroring
go/cargo/git) and the breakdown by subcommand both come from that
strategy.
Shipped: identity round-trip first, then layered hygiene + translation rules.
- Step 1.0 — identity round-trip. Full parse → emit cycle that produces byte-identical output for already-canonical input. Validation gate: VistA round-trip 38,954 / 39,330 routines (99.04%) — the residual 0.96% match the tree-sitter-m corpus boundary.
- Canonical hygiene rules.
--rules=canonicaladdstrim-trailing-whitespaceuppercase-command-keywords. Idempotent and AST-shape-preserving over the full VistA corpus.
- Phase A translation rules. Six AST-preserving, case-preserving
expand/compact rules ride alongside canonical hygiene:
expand-command-keywords(S→SET),compact-command-keywords(SET→S),expand-intrinsic-functions($L→$LENGTH),compact-intrinsic-functions,expand-special-variables($T→$TEST),compact-special-variables. Three case-folding companions (lowercase-command-keywords,lowercase-intrinsic-functions,lowercase-special-variables). Bundled into three presets —pythonic,pythonic-lower,compact— that translate between VistA-compact and canonical-name forms for developers coming from Python or other modern languages without the M tradition of one-/two-character abbreviations. All three are normalizing (idempotent on already-normalized input) rather than fully invertible.
Shipped breadth-first then deepened with cross-routine analysis, control-flow rules, and the M-MOD modernization track.
- Step 2.0 — engine-neutral lint engine. Rules register against a profile
registry; opinionated rule sets ship as named profiles (not as a fixed
baseline). The dividing line between the engine and the rule packs is
formalized in
src/m_cli/lint/profiles.pyso adding a non-VA-flavoured rule family doesn't require renaming any config. - Step 2.1 — XINDEX port. 42 of XINDEX's 66 rules ported to engine-neutral
AST checks (
M-XINDX-NN). Validation gate: full VistA corpus lint baseline. - Step 2.x — M-MOD modernization track. 30 engine-neutral, dialect-neutral
rules derived from contemporary M idioms (
M-MOD-NN). Includes length/complexity, concurrency, transactions, control-flow correctness, engine-aware portability, docs/style polish. Calibration corpus:m-modern-corpus. On a 4 K-routine non-VA corpus the curateddefaultprofile (M-MOD minus four pedantic rules) produces ~3 findings/routine — usable daily; the fullmodernprofile produces ~57 findings/routine, mostly from the four pedantic rules now split intopedantic. - Profile split. The default lint profile changed from
xindexto the curated M-MOD subset after modern-corpus validation showed XINDEX's SAC legacy rules generate ~62 K findings on non-VA modern code — mostly from SAC mandates around lowercase variables/commands that aren't followed outside the VA. VA shops opt back in via--rules=xindexor--rules=xindex,vista. pythonicprofile. Same rules asmodernplus tighter thresholds (line_length=100,commands_per_line=1,cyclomatic=10,cognitive=15,dot_block_depth=3,label_lines=30). Preset for Python-influenced developers.- Cross-routine + control-flow + engine-targeting. Workspace context
(
LintContext) flowed through to context-aware rules;--target-enginesilences engine-portability false positives on engine-specific code.
Shipped: parser-aware discovery, YottaDB runner, three output formats.
- Discovery walks
*TST.mfiles andt<UpperCase>(pass,fail)labels via the tree-sitter parse tree. The first label in a file (the routine entry) is never a test even if it accidentally matches. - Runner shells out to
ydb -run ^SUITE(whole suite) orydb -run %XCMD(single label). The runner is injected viaRunnerFnso unit tests don't need a live engine. - Output dialects:
text(human),tap(TAP v13, one point per assertion),json(CI-friendly). - TESTRUN protocol: parser keys off
PASS desc/FAIL desclines, theResults: N tests P passed F failedsummary, and theAll tests passed./<n> test(s) FAILED.banner.
Folded into Step 3. m test FILE.m::tLabel invokes ^%XCMD with a
synthesised driver that calls just the requested label.
Shipped: polling watcher with source→suite affinity.
- Polling, not inotify.
os.stat-based change detection at 0.5 s default interval. Pure-Python; nowatchdog/entr/inotifydependency. - Affinity rule.
<X>.msource change →<X>TST.msuite if it exists; otherwise every suite re-runs (defensive default). Suite-file edits map to themselves only. - Discovery dedup. Overlapping path arguments (e.g.
routines/androutines/tests/) discover each suite exactly once viaPath.resolve().
Tier 1 closure: 2026-04-27. All four §3.5 validation gates pass (VistA round-trip, single-engine smoke, CI dogfooding, performance under budget).
YDB built-in view "TRACE" instead of N ZBREAKs per label — one trace pass
covers the whole run. Trace third-subscript decoded: offset N from a label
maps to absolute line label_decl_line + N, so per-line hit counts are
precise. Output: text (default), text --lines (per-routine label + line
columns), json, lcov (genhtml / Codecov / Coveralls compatible).
--branch flag adds AST-driven branch-point identification (IF/ELSE/FOR
keywords + postconditionals); branches collected only when caller opts in
so default payloads stay byte-stable.
.pre-commit-hooks.yaml exposes m-fmt-check,
m-fmt, and m-lint. Schema gated by tests. Downstream usage in
docs/pre-commit.md.
Style rules ride alongside logic rules in m lint. --rules=sac selects
the SAC-tagged subset; severity overrides via [lint.severity] config.
DAP integration is its own engineering project; both engines ship ZBREAK
at the engine level. Not on the near-term roadmap.
Built incrementally in stages over a single foundation (pygls-based stdio
server, optional [lsp] extra). Per
m-tooling-tier1.md §5.4
the stage cadence was:
| Stage | Capability |
|---|---|
| 1 | Diagnostics push (didOpen/didChange/didSave/didClose) |
| 2 | Document formatting (textDocument/formatting) |
| 3 | Code actions (Quick Fix from fixer_id) |
| 4 | Hover + completion + --rules filter |
| 4b | Document symbols, code lenses (▶ Run test), folding, signature help, document highlight |
| B | Workspace symbol index + go-to-definition |
Editor wiring lives in
tree-sitter-m-vscode
which spawns m lsp on activation and registers m-cli.runTest for
code-lens click-to-run.
.m-cli.toml (preferred) and [tool.m-cli] in pyproject.toml (fallback)
drive m fmt, m lint, and m lsp. Discovery walks up from the working
directory; stops at .git. Schema: [lint] rules / disable / severity,
[fmt] rules, [lint.thresholds], [lint.taint]. CLI flags override
config; unknown keys ignored; invalid values raise.
m_cli.workspace.WorkspaceIndex maps routine_name (uppercased) → list[LabelLocation] for every .m file in the workspace. Backs
textDocument/definition, textDocument/references, workspace/symbol.
Stays fresh via didChangeWatchedFiles + didSave. Cross-routine lint
rules consume the same index.
m new— project scaffolder (Makefile,.m-cli.toml,tests/, CI)m run— ad-hoc routine executionm build— compile / package (retired 2026-05-11 — see "Retirements" below; the M runtime auto-compiles on first call, so this was redundant withm testand named after compile-mandatory toolchains that don't fit MUMPS)m doctor— environment self-check (ydb, parser, m-standard, manifests)m doc/m search/m manifest/m examples/m errors— m-stdlib documentation surface, manifest-driven
m plugins lists out-of-tree subcommands registered via the
m_cli.plugins entry-point group.
m-cli-extras is the first
consumer (ships m corpus-stats). Contract documented in
docs/plugin-development.md.
The lint perf budget per m-tooling-tier1.md §3.5 is 120 s for the full VistA corpus. Three optimisation passes:
| Phase | Time on full VistA corpus | Speedup vs prior | Notes |
|---|---|---|---|
| Original (Step 2.1, naive walk per rule) | ~1458 s | — | 12× over budget |
Single-pass NodeIndex |
166 s | 8.7× | Walk once per file; bucket by node type; dispatch off the bucket |
--jobs N ProcessPool |
22.6 s | 5.3× | 16-core host; 5.3× under budget; 64.5× faster than the original |
Findings byte-identical at every step (62,806 total / 42 fatal / 24,877 flagged). Cached parsed trees for incremental lint are deferred until the LSP daemon makes them meaningful.
- More data-flow lint rules. The remaining ~24 deferred XINDEX rules
(uninitialized variable read, naked references, kill of read-only var,
etc.) need data-flow / scope tracking. The infrastructure shipped with
LintContextmakes each easier to add incrementally. m test --watch/ JUnit XML / per-label results in whole-suite mode. Per-label whole-suite reporting is blocked onTESTRUN.mnot emitting per-label headers — either modify TESTRUN or have whole-suite runs internally invoke each label separately.- Inotify watcher. Polling burns CPU on idle for large trees. Swap
Pollerfor awatchdog-based implementation behind the same interface — affinity / CLI don't need to change. - Watcher debounce. Fast saves (editor backups, formatter passes) fire several events in a row; today each becomes a separate run. A 200–300 ms debounce would batch them.
- Cross-routine call graph for richer affinity. When
foo.mchanges, re-run any suite whose source calls^foo. Needs a simple call-graph index; out of scope for Tier 1. - LSP
workspace/configurationround-trip — per-rule disable / severity remap via the LSP protocol rather than--rules. Intentionally deferred; the CLI flag covers the immediate need without async plumbing. - CodeLens
resolveProvider— lazy command resolution if eager populate becomes a perf concern. hover-on-diagnostic— show rule descriptions in the hover popup when over a diagnostic squiggle.
- Branch is
master, notmain— different from most repos under the org. - 376 / 39,330 VistA routines fail to parse — these match the tree-sitter-m corpus boundary. Skipped from both round-trip and lint gates.
- 8 currently-silent registered XINDEX rules (M-XINDX-002, 015, 018, 021, 027, 028, 031, 054) fire on patterns rare in VistA but common in other corpora. Left registered for use against more diverse codebases.
scripts/lint_bench.pyhas a hardcoded~/vista-meta/vista/vista-m-host/Packages/...path. It's a maintainer microbenchmark, not part of the user surface; portability across machines is not a goal.
Commands that shipped in earlier phases but were later removed because experience showed they didn't earn their place in the surface. Recorded here so the history of why something is no longer there is as legible as the history of why something shipped.
What it was. Shipped in Phase 3a (2026-05-06) as one of the
six quick-win subcommands from plans/language-cli-survey.md
§6.2 (rank 9). It walked .m files
in the given paths and invoked ydb <file> on each — YottaDB's
MUMPS compiler, which emits a sibling .o object file. A --check
mode cleaned up generated .o files for CI use.
Why it was removed. Honest accounting after the
cli-menu-system.md frequency-rating
exercise showed m build doesn't earn a slot in the daily-use
surface:
- Redundant with
m test. YottaDB auto-compiles on first reference ($ZRO). Every routine your tests touch is compiled anyway — a syntax error in a referenced routine surfaces as a test-time compile failure, same exit code, same diagnostic. No incremental signal from runningm buildseparately. - Wrong language analogy. MUMPS is interpreted (like Python,
not like Go or Rust). Python's
python -m compileallparallels whatm builddoes (bytecode warmup) but it's almost never used in daily Python dev. Python's namespaced "build" command (python -m build) is for PEP 517 package distribution — a different concept entirely.m buildsat in a naming-and-frequency gap that doesn't exist for an interpreted language. - Tree-sitter-m +
m lintalready catch more. The parser-driven linter flags real problems the YottaDB compiler accepts (style, portability, dead code, untested patterns). Compile-rejects are a tiny remainder once lint is green. - Narrow remaining use cases don't justify a daily-loop verb.
The legitimate scenarios — CI syntax-gate over untested code,
post-bulk-refactor sanity sweep, pre-deploy
.owarmup — are either rare interactive use (≤ once per quarter) or scripted (CI / CD pipelines). For those,ydb <file>directly is five lines of bash; the m-cli convenience layer was thin.
What replaces it. Nothing on the m-cli surface. For the remaining narrow use cases:
- CI syntax gate over untested code: shell into a loop
(
for f in $(find . -name '*.m'); do ydb "$f" || exit 1; done) or wait for the eventualm lint --strict/m checkproposal to integrate the YottaDB compiler as one validator among several. - Post-refactor sanity sweep: same one-liner.
- Pre-deploy
.owarmup: belongs in the deploy pipeline, not in the developer-facing CLI.
Mechanical changes. src/m_cli/build/ package removed;
m build subparser unwired from src/m_cli/cli.py;
tests/test_build.py removed; the m build row dropped from
tests/test_cli_ux_contract.py; dist/commands.json regenerated;
references scrubbed from README.md, AGENTS.md,
docs/cli-menu-system.md, docs/guide.md,
docs/worked-example-accsum.md. The docs/plans/ historical
documents (language-cli-survey, iris-ydb-portability,
cli-ux-conventions-remediation) are left as-is — they're frozen
plan records, not as-is references.
Commands that shipped earlier under one name but were later moved into a different shape (typically a namespace). The behavior is preserved; only the invocation changes.
What changed. The 5 m-stdlib reference commands were lifted out
of the top-level namespace and grouped under a single m stdlib
parent dispatcher:
| Before | After |
|---|---|
m doc SYMBOL |
m stdlib doc SYMBOL |
m search QUERY |
m stdlib search QUERY |
m examples [MODULE] |
m stdlib examples [MODULE] |
m errors |
m stdlib errors |
m manifest [PATH] |
m stdlib manifest [PATH] |
Why. Cognitive and logical grouping. Five distinct top-level
verbs all served the same purpose (read the m-stdlib manifest in
different views), but their names didn't make that relationship
visible. m doc could have meant "doc the project" (m-cli's own
docs), "doc one routine", or "doc m-stdlib" — only the description
disambiguated. Grouping under m stdlib mirrors the existing
m engine <verb> and m ci <verb> patterns: when a cluster of
commands shares a domain, name the domain.
Mechanical changes. New src/m_cli/stdlib_cli.py registers
the stdlib subparser + 5 sub-actions (mirroring
m_cli.engine_cli.add_engine_arguments but without the
required=True anti-pattern — bare m stdlib prints a gh-style
overview). The 5 top-level parsers were removed from
src/m_cli/cli.py. Underlying handlers in m_cli.doc.* are
unchanged; only the registration site moved. Contract tests
updated: TestUnknownFlagRoutesToSubparser now has a separate
parametrize for m stdlib <verb>; TestDomainFailuresExit1
passes ["stdlib", verb]; new
test_stdlib_bare_exits_0_with_overview. dist/commands.json
regenerated.
No backward-compat shim. Per project convention (CLAUDE.md
"Don't use feature flags or backwards-compatibility shims when
you can just change the code"), m doc etc. now return
argparse's invalid choice error. Users who relied on the old
names see a clean error directing them to the new namespace.
Top-level count. 14 commands (down from 18). m stdlib
adds 5 sub-verbs; total distinct invocations: 28 (unchanged).
The engine-phase3 work (merged 2026-05-11) introduced
detect_engine() as the canonical resolver across local / docker /
SSH, made docker the default, and grew the m engine verb family.
But the runtime tools (m test, m coverage, m run) were never
migrated to the new resolver — they continued calling
read_connection() directly, which only returns an SSHEngine.
On docker-only hosts (the canonical default after 4f4b88c) this
meant those tools silently worked only if a stale vista-meta
conn.env happened to exist; on hosts without one they returned
"vista-meta connection not configured" despite a healthy
m-test-engine container.
This section tracks the migration of those tools to
detect_engine().
What changed. m run now resolves its transport via
detect_engine() and dispatches through a new
engine.build_run_cmd(entryref, extras, stage) method on each
Engine class (LocalEngine / DockerEngine / SSHEngine). Behaviour
is identical for the user: m run "^FOO" -- arg1 arg2 runs the
routine and feeds $ZCMDLINE. What's different is where it
runs — host process on a local-YDB box; docker exec m-test-engine bash -lc 'mumps -run ^FOO arg1 arg2' on a
docker-only box; SSH hop on a vista-meta-configured box.
Mechanics.
LocalEngine.build_run_cmdreturns["env", "ydb_routines=...", "mumps", "-run", entryref, *extras].DockerEngine.build_run_cmdshell-quotes every arg viashlex.quoteso spaces / quotes / dollar signs survive thebash -lchop, then wraps indocker exec <container> bash -lc.SSHEngine.build_run_cmddoes the same shell-quoting then routes through_ssh_argv/_remote_script.m_cli/run/cli.pyrewritten: drops the legacyresolve_ydb_binarypath, callsdetect_engine(),engine.stage_routines(cwd), thenengine.build_run_cmd(...). Missing-engine now returns 1 (DOMAIN_FAILURE) per CLI-UX guide §3.7 — was 2 (usage error) before; matches the PR-4 pattern.- Legacy helpers in
m_cli/run/runner.py(resolve_ydb_binary,build_env,build_command) preserved for library backcompat — some downstream tooling may still import them. Tests cover both surfaces.
Smoke test. Live host (docker-only, m-test-engine running)
ran m run "^HELLO" -- arg1 "two words" successfully — output
"hello from m run via docker, $ZCMDLINE=arg1 two words",
exit 0. The shell-quoting hop preserved the spaces in
"two words" through docker exec → bash -lc → mumps -run.
Pre-existing tests. All 19 test_run.py tests rewritten to
inject a FakeEngine via monkeypatch.setattr on
m_cli.run.cli.detect_engine instead of a fake ydb binary. The
pure-helper tests (parse_entryref, resolve_ydb_binary,
build_env, build_command) are kept untouched as library-API
regression gates.
Same migration as m run, applied to the two remaining runtime
tools that still called read_connection() directly. Pre-fix
symptom: on docker-only hosts (the canonical default since
4f4b88c), m test and m coverage silently produced "0/0 passed"
output — the user reads it as "tests ran but had no assertions",
when in reality the SSH transport to a non-running vista-meta
host returned empty output and the parser reported zero results.
Mechanical changes.
- 5 call sites + 3 imports across
src/m_cli/test/{cli,runner}.pyandsrc/m_cli/coverage/{cli,runner}.py: eachconn = read_connection()becomesconn = detect_engine(); imports updated accordingly. The polymorphic dispatcher wrappers (build_suite_ssh_cmd/build_xcmd_ssh_cmd/build_direct_ssh_cmd) already accepted any Engine, so call sites that passconninto them didn't need updating — only the construction ofconnchanged. - Type annotations on
run_suite/run_case(intest/runner.py) andrun_coverage(incoverage/runner.py) broadened fromconn: Connection | None(SSHEngine-only) toconn: Engine | None(any transport). m_cli.engine.seed_for_pathsretyped to accept any Engine and default todetect_engine()instead ofread_connection().
The runner-side bug uncovered during the smoke test. The
runtime tools also hardcoded stage = remote_stage(suite.path) —
which returns the SSH-style $HOME/export/seed/<proj> path that
makes no sense for docker. The fix: a new pure
stage_path(start) method on each Engine class (LocalEngine /
DockerEngine / SSHEngine). For Local/Docker it returns the same
path their existing stage_routines() does, with no side effects.
For SSH it returns the legacy remote_stage(start) without the
SCP. The runners now use conn.stage_path(...) per suite, while
seed_for_paths keeps using stage_routines() (with the SCP
side-effect) once upfront.
Smoke test (live on the host that uncovered the bug). With
~/data/vista-meta/conn.env moved aside so detect_engine
unambiguously resolved DockerEngine, ran m test HELLOTST.m
against a self-contained smoke suite in $HOME/m-work — got
"1 suite(s), 1 passed, 2/2 assertions passed". Pre-migration:
"1 failed, 0/0 assertions" (silent SSH-unreachable). m coverage
ran cleanly without the conn.env error. conn.env restored
after the test.
Pre-existing tests all kept passing (1521 / 1 skipped) — they
inject RunnerFn at the runner boundary and bypass
detect_engine() / read_connection() entirely.
m-cli's parser, formatter, and lint rules were calibrated during initial
development against the VistA corpus running on the vista-meta YottaDB
container. That bootstrap relationship is now historical — the default
test substrate is m-test-engine
(a minimal Docker YottaDB container), and the calibration corpus is
m-modern-corpus. The
vista-meta SSH path remains as an opt-in fallback for the maintainer's
existing setup.
For the full bootstrap account and the explicit independence verification,
see vista-meta-bootstrap.md.