[WIP] feature: shell integration 💻 by tnaum-ms · Pull Request #508 · microsoft/vscode-documentdb

tnaum-ms · 2026-02-17T14:28:11Z

Shell Integration — DocumentDB Query Language & Autocomplete

Umbrella PR for the shell integration feature: a custom documentdb-query Monaco language with intelligent autocomplete, hover docs, and validation across all query editor surfaces (filter, project, sort, aggregation, scratchpad).

Work is organized as incremental steps, each delivered via a dedicated sub-PR merged into feature/shell-integration.

Progress

Key Architecture Decisions

Decision	Outcome
Language strategy	`documentdb-query` custom language — JS Monarch tokenizer, no TS worker (~400-600 KB saved)
Completion providers	Single `CompletionItemProvider` + URI routing (`documentdb://{editorType}/{sessionId}`)
Completion data	`documentdb-constants` bundled at build time; field data pushed via tRPC subscription
Validation	`acorn.parseExpressionAt()` for syntax errors; `acorn-walk` + `documentdb-constants` for identifier validation
Document editors	Stay on `language="json"` with JSON Schema validation
Scratchpad	`language="documentdb-scratchpad"` referencing built-in JS grammar; in-process eval with `@mongosh` packages reusing existing `MongoClient`
Interactive Shell (future)	Separate from scratchpad; REPL with persistent eval context and `CommandInterceptor`

… stats bugs Group A of SchemaAnalyzer refactor: - Fix A1: array element stats overwrite bug (isNewTypeEntry) - Fix A2: probability >100% for array-embedded objects (x-documentsInspected) - Rename folder: src/utils/json/mongo/ → src/utils/json/data-api/ - Rename enum: MongoBSONTypes → BSONTypes - Rename file: MongoValueFormatters → ValueFormatters - Add 9 new tests for array stats and probability

Group B of SchemaAnalyzer refactor: - B1: SchemaAnalyzer class with addDocument(), getSchema(), reset(), getDocumentCount() - B2: clone() method using structuredClone for schema branching - B3: addDocuments() batch convenience method - B4: static fromDocument()/fromDocuments() factories (replaces getSchemaFromDocument) - B5: Migrate ClusterSession to use SchemaAnalyzer instance - B6-B7: Remove old free functions (updateSchemaWithDocument, getSchemaFromDocument) - Keep getPropertyNamesAtLevel, getSchemaAtPath, buildFullPaths as standalone exports

…x properties type Group C of SchemaAnalyzer refactor: - C1: Add typed x-minValue, x-maxValue, x-minLength, x-maxLength, x-minDate, x-maxDate, x-trueCount, x-falseCount, x-minItems, x-maxItems, x-minProperties, x-maxProperties to JSONSchema interface - C2: Fix properties type: properties?: JSONSchema → properties?: JSONSchemaMap - C3: Fix downstream type errors in SchemaAnalyzer.test.ts (JSONSchemaRef casts)

…temBsonType Group D of SchemaAnalyzer refactor: - D1: Add bsonType to FieldEntry (dominant BSON type from x-bsonType) - D2: Add bsonTypes[] for polymorphic fields (2+ distinct types) - D3: Add isOptional flag (x-occurrence < parent x-documentsInspected) - D4: Add arrayItemBsonType for array fields (dominant element BSON type) - D5: Sort results: _id first, then alphabetical by path - D6: Verified generateMongoFindJsonSchema still works (additive changes) - G4: Add 7 getKnownFields tests covering all new fields

… toFieldCompletionItems) Group E of SchemaAnalyzer refactor: - E1: generateDescriptions() — post-processor adding human-readable description strings with type info, occurrence percentage, and min/max stats - E2: toTypeScriptDefinition() — generates TypeScript interface strings from JSONSchema for shell addExtraLib() integration - E3: toFieldCompletionItems() — converts FieldEntry[] to CompletionItemProvider- ready FieldCompletionData[] with insert text escaping and $ references Also: - Rename isOptional → isSparse in FieldEntry and FieldCompletionData (all fields are implicitly optional in MongoDB API / DocumentDB API; isSparse is a statistical observation, not a constraint) - Fix lint errors (inline type specifiers) - 18 new tests for transformers + updated existing tests

- Add 5 tests for clone(), reset(), fromDocument(), fromDocuments(), addDocuments() - Mark all checklist items A-G as complete, F1-F2 as deferred - Add Manual Test Plan section (§14) with 5 end-to-end test scenarios - Document clone() limitation with BSON Binary types (structuredClone)

- Add monotonic version counter to SchemaAnalyzer (incremented on mutations) - Cache getKnownFields() with version-based staleness check - Add ClusterSession.getKnownFields() accessor (delegates to cached analyzer) - Wire collectionViewRouter to use session.getKnownFields() instead of standalone function - Add ext.outputChannel.trace for schema accumulation and reset events

Co-authored-by: tnaum-ms <171359267+tnaum-ms@users.noreply.github.com>

…ts (#507)

…ng behavior

…utput

…ypeScript definitions and completion items

Move SchemaAnalyzer, JSONSchema types, BSONTypes, ValueFormatters, and getKnownFields into packages/schema-analyzer as @vscode-documentdb/schema-analyzer. - Set up npm workspaces (packages/*) and TS project references - Update all extension-side imports to use the new package - Configure Jest multi-project for both extension and package tests - Remove @vscode/l10n dependency from core (replaced with plain Error) - Fix strict-mode type issues (localeCompare bug, index signatures) - Update .gitignore to include root packages/ directory - Add packages/ to prettier glob

…itions The bsonToTypeScriptMap emits non-built-in type names (ObjectId, Binary, Timestamp, etc.) without corresponding import statements or declare stubs. Currently harmless since the output is for display/hover only, but should be addressed if the TS definition is ever consumed by a real TS language service. Addresses PR #506 review comment from copilot.

…ion names - Prefix with _ when PascalCase result starts with a digit (e.g. '123abc' → '_123abcDocument') - Fall back to 'CollectionDocument' when name is empty or only separators - Filter empty segments from split result - Add tests for edge cases Addresses PR #506 review comment from copilot.

Add comment explaining why the cast to JSONSchema is safe: our SchemaAnalyzer never produces boolean schema refs. Notes that a typeof guard should be added if the function is ever reused with externally-sourced schemas. Addresses PR #506 review comment from copilot.

…lashes - Replace SPECIAL_CHARS_PATTERN with JS_IDENTIFIER_PATTERN for proper identifier validity check (catches dashes, brackets, digits, quotes, etc.) - Escape embedded double quotes and backslashes when quoting insertText - Add tests for all edge cases (dashes, brackets, digits, quotes, backslashes) - Mark future-work item #1 as resolved; item #2 (referenceText/$getField) remains open for aggregation completion provider phase Addresses PR #506 review comment from copilot.

…lity

…oved consistency

…PI alongside MongoDB API

…nsformers (#506)

- Create workerTypes.ts with typed IPC message protocol (MainToWorkerMessage, WorkerToMainMessage) - Create scratchpadWorker.ts with init/eval/shutdown/tokenRequest handlers - Add scratchpadWorker webpack entry point to webpack.config.ext.js - Worker lazy-imports @mongosh/* packages (same pattern as current evaluator) - Supports both SCRAM and Entra ID auth (OIDC via IPC token callback) - Worker logs lifecycle events to main thread via 'log' messages Step 6.2 WI-1, Phase 1 of 3.

Rewrite ScratchpadEvaluator to route all execution through the worker thread: - ScratchpadEvaluator now manages worker lifecycle (spawn, kill, shutdown, dispose) - Worker state machine: idle → spawning → ready → executing → ready (or terminated) - Request/response correlation via requestId UUID map - Timeout enforced via worker.terminate() — actually stops infinite loops - Help command stays in main thread (static text, no @MongoSH needed) - Cluster switch detection: kills worker and respawns with new credentials - Entra ID OIDC token requests delegated via IPC to main thread - Worker logging routed to ext.outputChannel - executeScratchpadCode.ts: cancellable progress notification In-process eval path is fully replaced — no feature flag. Step 6.2 WI-1, Phase 2 of 3.

- Export disposeEvaluator() from executeScratchpadCode.ts for clean worker shutdown - Wire evaluator disposal into extension deactivation via ext.context.subscriptions - Worker thread is properly terminated when the extension deactivates Completes the SCRAM auth + kill/respawn wiring (credential passthrough was already implemented in Phase 2's buildInitMessage). Step 6.2 WI-1, Phase 3 of 3.

SchemaStore integration: - Cap schema feeding at 100 documents (randomly sampled via Fisher-Yates) - Prevents unbounded IPC/memory usage for large result sets Connection state synchronization: - Shutdown worker when scratchpad connection is cleared (disconnect) - Shutdown worker when the last .documentdb editor tab closes - Worker respawns lazily on next Run Export shutdownEvaluator() for graceful worker cleanup. Step 6.2 WI-2, Phase 5.

…lysis Replace JSON.parse with EJSON.parse when deserializing worker eval results. This preserves BSON types (ObjectId, Date, Decimal128) so that SchemaAnalyzer correctly identifies field types for autocompletion. With JSON.parse, BSON types became plain objects ({'$oid': '...'}) causing SchemaAnalyzer to create wrong field paths (e.g. '_id.$oid' instead of '_id') and wrong types (object instead of objectid). Benchmarked on 100 documents with ~100 fields each: - JSON.parse: 2.2ms parse, broken types (wrong paths + types) - EJSON.parse: 9.9ms parse, correct types (only Int32/Long→Double) - Both are negligible vs actual query time (100-5000ms) Int32 and Long are still collapsed to Double (JavaScript number) — this is a fundamental EJSON limitation, not a serialization bug.

Show distinct progress messages during scratchpad execution: - 'Initializing scratchpad runtime…' — worker thread being created - 'Authenticating with {clusterName}…' — MongoClient connecting + auth - 'Running query…' — user code being evaluated On subsequent runs (worker already alive), only 'Running query…' is shown. The progress notification remains cancellable (Cancel kills worker). Added onProgress callback parameter to ScratchpadEvaluator.evaluate().

Log levels: - Worker init/shutdown → debug (lifecycle, not user-visible) - Eval start/end → trace (verbose diagnostic) - Errors/uncaught exceptions → error - MongoClient close failure → warn - Worker exit, connection clear, editors close → debug - Route worker IPC log messages to matching LogOutputChannel methods (trace/debug/info/warn/error) instead of appendLine for all Progress UX: - Title changed to 'DocumentDB Scratchpad' (static bold prefix) - Phase messages show as: 'Initializing…', 'Authenticating…', 'Running query…' - Removed cluster name from authenticating phase (redundant in context) Cancel handling: - Suppress error notification when user explicitly cancels execution - Track cancelled state to avoid showing 'Worker terminated' error panel

Fix regression: worker not shutting down when scratchpad editors close. - Switched from onDidCloseTextDocument to tabGroups.onDidChangeTabs - onDidCloseTextDocument fires before tab state updates (race condition) - onDidChangeTabs fires after tabs are removed, state is consistent Add 'Show Schema Store Stats' diagnostics command: - Shows collection count, document count, field count in output channel - Per-collection breakdown with key, doc count, field count - Available via Command Palette: 'DocumentDB: Show Schema Store Stats'

…e EJSON serialization CursorIterationResult from @MongoSH is an Array subclass with extra properties (cursorHasMore, documents). EJSON.serialize treats it as a plain object and includes those properties, producing: { cursorHasMore: true, documents: [...] } instead of just: [doc1, doc2, ...] This caused: 1. Output showed cursor wrapper object instead of document array 2. resultFormatter showed 'Result: Cursor' instead of 'N documents returned' 3. SchemaStore never received documents (no _id at top level of wrapper object) Fix: Array.from(shellResult.printable) before EJSON.stringify normalizes Array subclasses to plain Arrays, preserving correct serialization.

@MongoSH's CursorIterationResult extends ShellApiValueClass (not Array). Its asPrintable() returns { ...this } which produces: { cursorHasMore: true, documents: [doc1, doc2, ...] } This wrapper object was passed through as-is, causing: 1. Output showed { cursorHasMore, documents: [...] } instead of just [...] 2. Header showed 'Result: Cursor' instead of 'N documents returned' 3. SchemaStore never received documents (wrapper has no _id field) Fix: Add unwrapCursorResult() helper that extracts the documents array from the { documents: [...] } wrapper. Applied in both: - resultFormatter.ts — for display formatting and document count - executeScratchpadCode.ts — for SchemaStore feeding

Use @MongoSH's result type instead of guessing from array shape: - Cursor results: 'Result: Cursor (20 documents)' — type + batch count - Other typed results: 'Result: Document', 'Result: string', etc. - No type: no header line (plain JS values) Previous behavior tried to detect 'documents' by checking Array.isArray which was fragile and didn't communicate what kind of result it was.

…count) - .toArray() returns type=null with an Array: show 'N results' - .count() returns type=null with a number: no special header (value shown) - Cursor: 'Result: Cursor (N documents)' (unchanged) - Typed results: 'Result: Document', etc. (unchanged)

1. Untyped array results (e.g. .toArray()): 'Result: Array (5 elements)' 2. Worker eval log: include line count '(3 lines, 51 chars, db: demo_data)' 3. Schema stats: show 'db/collection' instead of internal clusterId::db::coll

- Connect instruction dialog is now modal with title/detail separation - Scratchpad template includes note: 'only the last result is displayed' - Help text Tips section includes same note - Wording differs between template and help (not identical)

Replace 'MongoClient' with 'client' or 'database client' in: - Worker log messages visible in the output channel - JSDoc comments describing worker behavior - Code comments in worker and evaluator Type references (MongoClient, MongoClientOptions) are unchanged — these are the actual driver API names. The extension is a DocumentDB tool using the MongoDB API wire protocol. User-facing text should not reference MongoDB implementation details.

tnaum-ms · 2026-03-26T13:45:56Z

Step 6.2 — Persistent Worker Eval (Option F)

PR #540 implements Step 6.2: persistent worker thread for scratchpad code evaluation.

Key changes:

Scratchpad eval moves from in-process vm.runInContext() to a lazy persistent worker_threads Worker
Worker owns its own database client (isolated from Collection View)
worker.terminate() provides real infinite loop protection (the old Promise.race timeout couldn't preempt a blocked event loop)
Entra ID auth via IPC token callback (VS Code session reused, no re-authorization)
EJSON.parse for BSON type fidelity in schema analysis
Phased progress notifications (Initializing → Authenticating → Running)
Cancellable execution with clean worker shutdown
CursorIterationResult unwrapping for correct output formatting
Schema stats diagnostics command

See #540 for full details.

…state If buildInitMessage() throws or the worker reports initResult { success: false }, spawnWorker() now calls terminateWorker() before rethrowing. This returns the evaluator to 'idle' state so the next evaluate() call can respawn a fresh worker instead of being stuck in the 'spawning' state indefinitely.

…numeric types Switch from relaxed to canonical EJSON (relaxed: false) for the worker-to-main IPC payload. Canonical EJSON preserves Int32, Long, Double, and Decimal128 type wrappers so that EJSON.parse on the main thread reconstructs actual BSON instances. This allows SchemaAnalyzer.inferType() to correctly distinguish numeric subtypes, which feeds into type-aware operator ranking in completions. Also drops the space/indent parameter from EJSON.stringify since the IPC payload is never displayed to users — reducing transfer size. Fixes the incorrect comment that claimed Int32/Long collapse was a fundamental EJSON limitation (it was caused by using relaxed mode).

Mark displayBatchSize in workerTypes.ts and ScratchpadEvaluator.ts with TODO(F11) comments noting the field is sent but not yet read by the worker. References future-work.md §F11 for the plan to wire documentDB.mongoShell.batchSize.

…aluator Wrap the three error strings most likely to reach users in l10n.t(): - 'No credentials found for cluster {0}' - 'Worker is not running' - 'Execution timed out after {0} seconds' These flow through to vscode.window.showErrorMessage via the catch in executeScratchpadCode.ts, so non-English users now see translated details.

…er lifecycle

…improved telemetry

…nResult shape The unwrapCursorResult() check and feedResultToSchemaStore() unwrap now require both 'cursorHasMore' (boolean) and 'documents' (array) before unwrapping. Previously, any object with a 'documents' array field would be unwrapped, which could false-positive on user documents with a 'documents' field.

…quest timeout The timeout in sendRequest() is used for init, eval, and shutdown. The previous message 'Execution timed out' was misleading when init hangs. Changed to 'Operation timed out after {0} seconds' which is accurate for all callers.

…oded 0 Capture startTime before evaluate() and compute elapsed time on failure. The error output panel now shows the real duration instead of 'Executed in 0ms'.

Only perform count (100) swaps instead of shuffling the entire array. Same output distribution, simpler loop bounds.

…Step 6.2) (#540)

tnaum-ms and others added 27 commits February 16, 2026 20:16

Initial plan

633f0b4

refactor: remove debug console.log statements from tests

74eeeac

Co-authored-by: tnaum-ms <171359267+tnaum-ms@users.noreply.github.com>

refactor: remove debug console.log statements from SchemaAnalyzer tes…

eb71916

…ts (#507)

test: add comprehensive tests for SchemaAnalyzer versioning and cachi…

d8d0709

…ng behavior

refactor: remove console.log statements from test files for cleaner o…

ebdde30

…utput

refactor: enhance handling of special characters in field names for T…

c23b604

…ypeScript definitions and completion items

docs: add README and bump schema-analyzer to v1.0.0

2fec69d

build: add prebuild and prejesttest scripts for workspace package builds

a667c35

chore: bump schema-analyzer version to 1.0.0 in package-lock.json

cbaa573

Refactor code structure for improved readability and maintainability

f1d006d

refactor: streamline TypeScript definition tests for improved readabi…

35a13a1

…lity

docs: add terminology guidelines for DocumentDB and MongoDB API usage

1cb9e5b

refactor: replace 'console' assert with 'node:assert/strict' for impr…

43915a5

…oved consistency

refactor: update documentation to consistently reference DocumentDB A…

75536e9

…PI alongside MongoDB API

refactor: SchemaAnalyzer class + enhanced FieldEntry + new schema tra…

5094ca6

…nsformers (#506)

tnaum-ms linked an issue Feb 17, 2026 that may be closed by this pull request

Improve Scrapbook Experience (shell integration) 🚀 #66

Open

tnaum-ms removed a link to an issue Feb 17, 2026

Improve Scrapbook Experience (shell integration) 🚀 #66

Open

tnaum-ms added this to the 0.8.0 - February 2026 milestone Feb 17, 2026

tnaum-ms added 16 commits March 24, 2026 14:46

style: prettier formatting and l10n bundle update

6fd8948

tnaum-ms mentioned this pull request Mar 26, 2026

feat(scratchpad): persistent worker thread for scratchpad execution (Step 6.2) #540

Merged

tnaum-ms added 12 commits March 26, 2026 15:55

chore: l10n

7eba164

feat(scratchpad): enhance telemetry for scratchpad execution and work…

137ddc1

…er lifecycle

feat(scratchpad): add runMode parameter to executeScratchpadCode for …

f30c7b6

…improved telemetry

fix(scratchpad): pass actual duration to formatError instead of hardc…

ed12046

…oded 0 Capture startTime before evaluate() and compute elapsed time on failure. The error output panel now shows the real duration instead of 'Executed in 0ms'.

perf(scratchpad): use partial Fisher-Yates for randomSample

f9cfe22

Only perform count (100) swaps instead of shuffling the entire array. Same output distribution, simpler loop bounds.

feat(scratchpad): persistent worker thread for scratchpad execution (…

94acf44

…Step 6.2) (#540)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feature: shell integration 💻#508

[WIP] feature: shell integration 💻#508
tnaum-ms wants to merge 202 commits intonextfrom
feature/shell-integration

tnaum-ms commented Feb 17, 2026 •

edited

Loading

Uh oh!

tnaum-ms commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tnaum-ms commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Shell Integration — DocumentDB Query Language & Autocomplete

Progress

Key Architecture Decisions

Uh oh!

tnaum-ms commented Mar 26, 2026

Step 6.2 — Persistent Worker Eval (Option F)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tnaum-ms commented Feb 17, 2026 •

edited

Loading