Skip to content

winui-search: batched CLI, background refresh, ranking + Gallery/Toolkit fetch upgrades#83

Open
lei9444 wants to merge 7 commits into
stagingfrom
leilzh/winui-search-improvements
Open

winui-search: batched CLI, background refresh, ranking + Gallery/Toolkit fetch upgrades#83
lei9444 wants to merge 7 commits into
stagingfrom
leilzh/winui-search-improvements

Conversation

@lei9444
Copy link
Copy Markdown

@lei9444 lei9444 commented May 13, 2026

Description

  • Batched CLI: search / get take multiple args; results grouped by control with up to 3 scenarios each.
  • Gallery fetch : parses each control's .xaml.cs and emits per-example code-behind (event handlers, x:Bind targets, x:DataType) instead of empty stubs.
  • Toolkit fetch: dedicated keywords field from .md frontmatter, #if WINAPPSDK folding, per-sample header/description extraction.
  • Background refresh (BackgroundUpdater.cs): hot-path commands never block on GitHub. Detached child refreshes if cache > 7 d; atomic lock, 10-min TTL, 1-h failure backoff. WINUI_SEARCH_NO_BACKGROUND=1 to opt out.
  • Ranking: weighted BM25 fields (header / description / tags / keywords), header-vs-description deduplication, score floor + coverage gate, expanded synonyms, tag-noise filter, control-family hints.
  • CacheVersion.cs: single source of truth; mismatch → embedded JSON.

Type of Change

  • ♻️ Refactoring

Affected area

  • Tool: winui-search

Checklist

  • Tested locally on Windows (build + agent invocation if applicable)
  • If a CLI command or agent invocation changed: README.md updated

Additional Notes

AI Description

This section is auto-generated by AI when the PR is opened or updated. To opt out, delete this entire section including the marker comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly upgrades the src/tools/winui-search CLI to be faster on the hot path (embedded-first), more resilient (background cache refresh), and more useful to agents (grouped/batched results plus richer scenario metadata extracted from Gallery/Toolkit sources).

Changes:

  • Refactors Gallery/Toolkit fetching to serve embedded snapshots immediately and move GitHub refresh into an explicit update path (and background refresh).
  • Adds batched CLI behavior (search/get) and grouped ranking output with improved BM25 weighting, stopword/synonym handling, and scenario/metadata formatting.
  • Extends Toolkit metadata with curated keywords (embedded resource + frontmatter extraction) and adds a unified cache schema version (CacheVersion.cs).

Reviewed changes

Copilot reviewed 19 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tools/winui-search/Program.cs Batched CLI commands, background refresh hook, updated update flow
src/tools/winui-search/BackgroundUpdater.cs New stale-while-revalidate background refresh + lock/backoff/logging
src/tools/winui-search/SearchEngine.cs Grouped search + ranking/formatting changes and new fallback behavior
src/tools/winui-search/GalleryFetcher.cs Embedded-first load + GitHub refresh path; richer Gallery extraction incl. code-behind extraction
src/tools/winui-search/ToolkitFetcher.cs Embedded-first load + GitHub refresh path; frontmatter keywords + description extraction
src/tools/winui-search/CacheVersion.cs Single cache schema/version source of truth
src/tools/winui-search/DataLoader.cs Loads embedded Toolkit keywords dictionary
src/tools/winui-search/Models.cs Scenario schema extended with description/metadata fields
src/tools/winui-search/BM25.cs Tokenization tweaks + coverage gate support (CountHits)
src/tools/winui-search/StopWords.cs Expanded stopwords + tag-dictionary cleaning helpers
src/tools/winui-search/winui-search.csproj Embeds new toolkit-keywords.json resource
src/tools/winui-search/Data/toolkit-keywords.json New embedded curated keywords payload
src/tools/winui-search/README.md, DATA_SOURCES.md Documentation updates for new behavior and data sources
plugins/winui/skills/winui-design/SKILL.md Skill guidance updated to match new batched CLI usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/winui-search/BackgroundUpdater.cs
Comment thread src/tools/winui-search/Program.cs Outdated
Comment thread src/tools/winui-search/Program.cs
Comment thread src/tools/winui-search/GalleryFetcher.cs Outdated
Comment thread src/tools/winui-search/SearchEngine.cs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 22 changed files in this pull request and generated 1 comment.

Comment thread src/tools/winui-search/ToolkitFetcher.cs Outdated
@lei9444 lei9444 requested a review from nmetulev May 13, 2026 09:55
@nmetulev
Copy link
Copy Markdown
Member

🤖 Automated review — generated by GitHub Copilot CLI (Claude Opus 4.7) using the repo's pr-review skill, with a multi-model cross-check pass on GPT-5.4. Findings are advisory; please apply your own judgment.

Summary

Critical High Medium Low
0 3 3 0

Coverage

Dimension Result
skill-content ⚠ 1 finding (folded into M1)
skill-tool-boundary ⚠ 1 finding (M1; multi-model downgraded H→M)
tool-correctness ⚠ 1 finding (H1; recommendation widened by multi-model)
payloads-and-tests ✓ clean
docs-and-manifests ⚠ 2 findings
multi-model (gpt-5.4) ✓ 2/2 critical+high cross-checked + 2 new high

Findings

ID Location Domain One-liner
H1 src/tools/winui-search/GalleryFetcher.cs:107-123, ToolkitFetcher.cs:209-226 tool-correctness Non-atomic cache writes; corrupted JSON silently overwritten by stale embedded fallback
H2 src/tools/winui-search/BackgroundUpdater.cs:38-40,82-116,201-227, Program.cs:28,381-390 multi-model Foreground update bypasses the background-refresh lock and deletes update.lock itself — races with in-flight background updater
H3 src/tools/winui-search/Program.cs:82-91, SearchEngine.cs:24-37,145-148,272-286 multi-model Source-collision fix incomplete: tags/keywords still keyed by bare controlId (gallery↔toolkit bleed; reproduced with debug "sliders")
M1 plugins/winui/skills/winui-design/SKILL.md:10-16, Program.cs:397-402 skill-tool-boundary, skill-content Tier-1 ↔ Tier-3 drift: SKILL.md dropped update from the command block; CLI still implements & advertises it
M2 plugins/winui/plugin.json:4, .github/plugin/marketplace.json:15 docs-and-manifests User-visible CLI/background changes shipped without a plugin version bump (still 0.3.0)
M3 src/tools/winui-search/DATA_SOURCES.md docs-and-manifests New 140-line doc not linked from any README/skill — won't be discovered

Details

H1 — Non-atomic cache writes

  • Severity: high · Confidence: high · Tier: 1
  • Files: src/tools/winui-search/GalleryFetcher.cs:107-123, src/tools/winui-search/ToolkitFetcher.cs:209-226
  • Multi-model: confirmed (with a scoping clarification)

RefreshFromGitHub() performs four sequential File.WriteAllText calls (scenarios JSON → tags JSON → timestamp → version). A crash mid-write leaves a truncated JSON. On next run the deserialize exception is swallowed (GalleryFetcher.cs:78-84, ToolkitFetcher.cs:171-184), embedded fallback data is loaded, and that fallback then overwrites the corrupted cache (GalleryFetcher.cs:92-103, ToolkitFetcher.cs:192-205) — silently discarding any salvageable GitHub data and pinning the user to stale embedded data until the next successful refresh.

// GalleryFetcher.cs:107-123 (RefreshFromGitHub)
File.WriteAllText(scenarioCache, JsonSerializer.Serialize(scenarios, JsonContext.Default.ScenarioArray));
File.WriteAllText(tagCache,      JsonSerializer.Serialize(tags,      JsonContext.Default.DictionaryStringStringArray));
File.WriteAllText(timestampFile, DateTime.UtcNow.ToString("o"));
File.WriteAllText(versionFile,   CacheVersion.Current);

Recommendation: Use temp-file + File.Move(.., overwrite: true) (atomic on Windows for same-volume moves) for each of the four files in both RefreshFromGitHub() and the Load() fallback-write paths. For full set-atomicity, write the four temp files first and rename them in this order: scenarios → tags → version → timestamp last, so a stale-detector treats a partially-renamed set as still-stale rather than fresh-but-truncated.

H2 — Foreground update bypasses the background-refresh lock

  • Severity: high · Confidence: high · Tier: 1
  • Files: src/tools/winui-search/BackgroundUpdater.cs:38-40,82-116,201-227, src/tools/winui-search/Program.cs:28,381-390
  • Multi-model: new (confirmed via code trace)

Background coordination uses %LOCALAPPDATA%\winui-search\cache\update.lock (BackgroundUpdater.cs:38-40,201-227); TryKickoffIfStale() acquires it before spawning update --background (BackgroundUpdater.cs:82-116). Foreground update never participates: Program.cs:28 calls ForceFetch(), which at Program.cs:381-390 recursively deletes the entire cache root including update.lock itself, plus last-github-update.txt and last-github-attempt.txt.

If a hot-path command has already spawned a background updater, a concurrent manual update defeats the single-writer guard and both processes write the same cache files concurrently.

Recommendation: Make the foreground update path participate in the same lock protocol — acquire update.lock (or fail fast / wait) before clearing or rewriting cache state. Either reuse BackgroundUpdater.AcquireLock(), or replace the recursive-delete in ForceFetch() with per-file atomic rewrites under the lock so the lock file itself is never wiped.

H3 — Source-collision fix incomplete (tags/keywords)

  • Severity: high · Confidence: high · Tier: 1
  • Files: src/tools/winui-search/Program.cs:82-91, src/tools/winui-search/SearchEngine.cs:24-37,145-148,272-286
  • Multi-model: new (reproduced live with dotnet run -c Release -- debug "sliders")

Scenarios are correctly separated by "{Source}:{ControlId}" (SearchEngine.cs:24-37), but enrichment tags and curated keywords are still keyed by bare controlId (SearchEngine.cs:145-148,272-286). Worse, Program.cs:82-91 merges gallery and toolkit tag dictionaries into one bare-keyed map — so the second source's tags overwrite the first's:

// Program.cs:82-91 — single bare-controlId tag map fed to SearchEngine
var tags = new Dictionary<string, string[]>(StringComparer.OrdinalIgnoreCase);
foreach (var kv in galleryTags) tags[kv.Key] = kv.Value;
foreach (var kv in toolkitTags) tags[kv.Key] = kv.Value; // silently overwrites

Real collision on this branch: both sources expose colorpicker. Data/gallery-tags.json lacks sliders; Data/toolkit-tags.json includes it. dotnet run -c Release -- debug "sliders" ranks [gallery] ColorPicker even though sliders exists only in toolkit metadata.

Recommendation: Key tags and curated keywords by the same composite "{Source}:{ControlId}" used for scenarios. Stop merging source metadata into a single bare-keyed map in Program.cs:82-91; thread Source through and look up tags/keywords with the composite key throughout SearchEngine.cs.

M1 — SKILL.md ↔ CLI drift on update

  • Severity: medium (originally high; multi-model downgrade) · Confidence: high · Tier: 3 + 1
  • Files: plugins/winui/skills/winui-design/SKILL.md:10-16, src/tools/winui-search/Program.cs:397-402

SKILL.md removed the .\winui-search.exe update line from the documented command block, but Program.cs:402 still prints update Force refresh from GitHub (clears cache; auto-runs in background when stale) and the implementation is intact. Manual update is still a legitimate "refresh now" escape hatch — background refresh is lazy, stale-only, and disableable via WINUI_SEARCH_NO_BACKGROUND=1 — so the tool is right and the skill is incomplete.

Recommendation: Re-add a single concise line to the SKILL.md command block, e.g.

.\winui-search.exe update    # force refresh now (cache also auto-refreshes in background when stale)

Do not delete the update implementation.

M2 — Plugin version not bumped

  • Severity: medium · Confidence: high
  • Files: plugins/winui/plugin.json:4, .github/plugin/marketplace.json:15

PR ships a substantial user-visible change (batched CLI surface, background-refresh subsystem, ranking changes), but both manifests are still 0.3.0. Bump them in lockstep (e.g. 0.4.0).

M3 — DATA_SOURCES.md is dead weight

  • Severity: medium · Confidence: high
  • File: src/tools/winui-search/DATA_SOURCES.md

The new 140-line doc is shipped but linked from nowhere — no reference in top-level README.md, src/tools/winui-search/README.md, the winui-design skill, or the agent file. Add a "Data sources" link from src/tools/winui-search/README.md (and optionally a one-line pointer from the top-level README's tools table).


Coverage notes

  • skill-content — Reviewed full winui-design/SKILL.md frontmatter and changed command/workflow block; verified Program.cs still implements update, batched search/get; counted Data scenarios (~379 patterns claim is accurate); checked sibling skills for duplicated wording. Finding folded into M1.
  • skill-tool-boundary — Reviewed SKILL.md diff for misplaced catalog/rule prose; reviewed Synonyms.cs / StopWords.cs / Notes.cs for tier drift (none — these are correct Tier 1 placement); confirmed batched CLI is a Tier 1 win and the SKILL.md edit is a thin wrapper around it; grepped plugins/winui/ and .github/skills/ for orphaned winui-search update references.
  • tool-correctness — Verified AOT discipline: all JsonSerializer use source-gen contexts, no reflection, no Assembly.Location (uses Environment.ProcessPath), AOT publish 0 warnings. Reviewed BackgroundUpdater.cs lock/process semantics, GalleryFetcher.cs/ToolkitFetcher.cs HTTP+cache, Program.cs CLI batching (0/many/duplicate IDs handled), SearchEngine.cs ranking boundary conditions (empty / all-stopword query, BM25 divide-by-zero), CacheVersion.cs consumption, DataLoader.cs embedded-resource wiring (toolkit-keywords.json correctly registered).
  • payloads-and-testswinui-search.exe refreshed alongside source ✓. CI smoke tests list and search "tabview" only — does NOT call update, so the removal-or-keep decision doesn't break CI. Size check is committed-PR-exe vs CI-built-exe with 10% tolerance; a local build-tools.ps1 reproduction produced byte-identical output (0% delta). All five Data/*.json are wired as <EmbeddedResource> (new toolkit-keywords.json confirmed). No analyzer changes. No winui-search test project exists (pre-existing gap, not this PR's responsibility).
  • docs-and-manifests — Read README.md, both manifests, the agent file, all winui-design SKILL.md content, winui-search/README.md, the new DATA_SOURCES.md, pr-validation.yml, and the changed C# entry points. Tool README accurately reflects the new CLI surface; agent file makes no winui-search invocation claims that drifted.
  • multi-model (gpt-5.4) — Cross-checked H1 (confirmed, with a scoping note that the recommendation must also cover the Load() fallback writes and order the renames so timestamp lands last), and the original H2-drift (downgraded to M1). Independently discovered the two new high-severity bugs above (H2, H3) — H3 reproduced live with dotnet run -- debug "sliders".

@nmetulev nmetulev changed the base branch from main to staging May 14, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants