Boyuc/observatory draft demo by quic-boyuc · Pull Request #19288 · pytorch/executorch

quic-boyuc · 2026-05-05T04:06:10Z

[RFC Draft] Observatory: a unified debugging framework for ExecuTorch

This is the reference POC accompanying the Observatory RFC — draft PR opened so reviewers can read the RFC and run the code side by side.

Summary

Observatory turns per-backend debugging scripts into one shared flow: each backend contributes lenses (Python extensions that implement debugging logic end-to-end — capture, analyze, render); the framework handles the session lifecycle, report assembly, and the outputs. fx_viewer is a standalone, dependency-free FX-graph renderer used as Observatory's graph view, also usable by anyone with a torch.fx graph.

Design rationale, architecture, and extension patterns are in the RFC. This PR is what the design looks like when you build it.

Relationship to the RFC

The RFC describes the ideal design. It treats APIs (json_frontend, --compare) and features (Analyzed Report JSON, cross-time regression) as first-class design elements even where the code hasn't caught up yet. RFC §8 is the canonical reference for what's implemented vs proposed.
This PR implements a subset. What ships here is enough to demo the full end-to-end flow on Qualcomm and XNNPACK with per-layer accuracy analysis. The not-yet-implemented pieces are tracked as follow-up work, not blockers for reviewing the design.
We welcome design-level feedback in this PR. If you disagree with the Lens protocol shape, the runtime/analyzed split (RFC §4.5), or the two-frontend plan, surface it here — easier to change the design before the proposed pieces land.

What's in this PR (current demo scope)

Core framework — devtools/observatory/
- observatory.py — session lifecycle, nested config stack, capture store, report assembly
- interfaces.py — Lens protocol, typed frontend block contracts
- graph_hub.py — base graph + analyze-phase overlay merge; fx_viewer bridge
- cli.py — generic CLI with collect and visualize modes
- observe_pass.py — @observe_pass decorator
FX viewer — devtools/fx_viewer/ — FX extraction, Sugiyama layout, extension-layer API, canvas-based JS runtime
Seven common lenses — devtools/observatory/lenses/: graph, metadata, accuracy, per_layer_accuracy, stack_trace, pipeline_graph_collector, graph_color
Backend CLIs as worked examples:
- backends/qualcomm/debugger/observatory/
- backends/xnnpack/debugger/observatory/
Invocation surfaces: generic CLI, backend CLI, Observatory.enable_context(...) context manager, @observe_pass decorator, direct Observatory.collect(name, artifact)
Exports:
- HTML Report (self-contained, for reviewers)
- Raw Capture (JSON) (export_json + visualize reload path)

What's in the RFC but NOT in this PR

See RFC §8 for the full list with rationale. Headliners:

json_frontend + Analyzed Report (JSON) — the second frontend hook for LLM triage, CI analytics, and dashboards.
--compare CLI mode — cross-time regression over archived Raw Captures.
Runtime / delegated-graph accuracy lens — port of qnn_intermediate_debugger.py.
Additional lenses — partition color layer, qparams audit, .pte diff, size analysis, ETDump-fed runtime lenses, ADB capture.
Non-FX graph formats (PyTorch graph, QNN graph, TOSA) as first-class fx_viewer exporters.
Nightly-regression CI recipe packaging the --compare flow.
Live debugging dashboard built on fx_viewer for streaming event use cases.

Each item has a natural landing point in the Lens protocol or CLI; no breaking changes to what ships here.

How to try it

pip3 install 'fast-sugiyama[full]'   # requires python >= 3.11

# XNNPACK — per-layer accuracy demo, zero code change
python -m executorch.backends.xnnpack.debugger.observatory \
    --output-html /tmp/mv2/obs_report.html \
    --lens_recipe=accuracy \
    examples/xnnpack/aot_compiler.py \
    --model_name=mv2 --delegate --quantize --output_dir /tmp/mv2

# Qualcomm — same pattern
python -m executorch.backends.qualcomm.debugger.observatory \
    --output-html obs_report.html \
    --lens_recipe=accuracy \
    examples/qualcomm/oss_scripts/mobilevit_v2.py \
    --backend htp --model SM8650 -d ./imagenet-mini-val/ \
    -b build-android/ --compile_only

# Reload a saved Raw Capture into a fresh HTML (uses current lens code)
python -m executorch.devtools.observatory visualize \
    --input-json run.json --output-html run.html

Pre-generated HTML reports on a matrix of models are linked from the demo index (see RFC §3).

Review guidance

Test plan

Unit tests under devtools/observatory/tests/ pass
End-to-end XNNPACK MV2 run produces expected HTML report
End-to-end Qualcomm MobileViT v2 run produces expected HTML report
visualize reload produces HTML equivalent to the original run
Lint / typecheck
Pre-generated reports on model matrix spot-checked manually

pytorch-bot · 2026-05-05T04:06:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19288

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Ubuntu services are down

⚠️ 11 Awaiting Approval

As of commit 1dd5421 with merge base 0a113f8 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-05T04:06:54Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ite fix

…pers

…demo

…ect hook

- Use python as clean API and function arguments - Fix bug in html_template.py (\n --> \\n)

…otstrap Three fixes for HTML report correctness and size: 1. Base64-encode HtmlBlock.content in the JSON payload so </script> and other special characters cannot corrupt the outer <script> tag. The JS runtime decodes with atob() before innerHTML assignment (03_blocks.js, renderHtmlCompare). 2. Gzip+base64 compress the full JSON payload when it exceeds 8 KB (observatory.py _compress_payload). The browser decompresses via DecompressionStream inside an async IIFE, which also moves the Observatory runtime execution to after the FX viewer bundle is injected — fixing the "FXGraphViewer unavailable" race condition that existed when Script 3 ran before Script 2 finished awaiting (html_template.py). 3. Base64-encode resources.js[] entries in generate_ui_test_harness.py so the test harness goes through the same pipeline as production reports instead of bypassing it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two bugs fixed: 1. Wrong initial camera on first load — viewer.init() was called synchronously before the browser had laid out the container, so getBoundingClientRect() returned {width:0, height:0} and the camera was placed far off-screen. Fixed by deferring init() to the next animation frame (requestAnimationFrame). 2. All viewer state lost on every record switch — destroyGraphRuntime() was unconditionally destroying every live viewer. Camera, selected node, active layers, colorBy, and zoom were all reset. Fix: hybrid viewer cache. Single-record graph blocks use a live DOM cache keyed by (recordIndex, lensName, blockId). On navigate-away the wrapper is detached from the DOM but the viewer stays alive in state.viewerCache. On return the wrapper is re-appended and a resize rAF is queued — no re-init, no re-layout, full state preserved. LRU eviction at 10 viewers. Compare-mode viewers are always freshly created (keeping N side-by-side viewers alive would multiply memory cost). Instead a lightweight state snapshot {camera, selectedNodeId, activeExtensions, colorBy} is saved to state.compareStateCache on every statechange event. On re-entry each new viewer is seeded from the snapshot: selectNode+animate if a node was selected, setState({camera}) otherwise. README section 12 documents the memory budget and trade-off rationale. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…acy lenses Introduces a `python -m executorch.backends.qualcomm.debugger.observatory.cli` runner that wraps any standard Qualcomm example script (e.g. swin_v2_t.py) to automatically enable full Observatory debugging with no script modifications. Key changes: PipelineGraphCollectorLens (new): - Absorbs ETRecordAutoCollector from the deleted auto_collect.py, moving all monkey-patching out of the Observatory core framework and into a lens. - Patches framework-level functions (torch.export.export, prepare_pt2e, convert_pt2e, to_edge_transform_and_lower, ETRecord.add_*) to auto-collect graph snapshots at each compilation stage: Exported Float, Annotated Model, Calibrated Model, Quantized Model, Edge, Transformed Edge, ETRecord records. - Forces generate_etrecord=True in to_edge_transform_and_lower to ensure ETRecord collection fires automatically. - Framework-level patches work for all backends (QNN, XNNPack, CoreML, etc.). AccuracyLens (new): - Ports AccuracyEvaluationLens from legacy debugging_utils to new Observatory interfaces (ViewList/TableBlock). - Adds MaskedTokenAccuracy metric and MLMEvaluator for masked language model scripts (bert, roberta, distilbert, albert, eurobert). - Auto-patches get_imagenet_dataset and get_masked_language_model_dataset to capture targets, covering 24 of ~30 standard oss_scripts. - Auto-detects task type (classification vs MLM), post_process function, and default metrics from model output format. CLI runner (new): - cli.py + __main__.py: parse observatory flags, register lenses, wrap target script in Observatory.enable_context(), run via runpy.run_path(), generate HTML + JSON report to {artifact}/observatory_report.{html,json}. - Flags: --no-accuracy, --no-report, --report-title. observatory.py: - Remove ETRecordAutoCollector.install/uninstall calls; patching is now delegated to PipelineGraphCollectorLens when registered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…st "Exported Float" record AccuracyLens was not producing metrics in HTML reports because the evaluator was configured in a build_executorch_binary POST-hook, but all Observatory.collect() calls happen DURING that function — before the evaluator existed. The fix removes the build_executorch_binary patch entirely and instead configures the evaluator lazily when AccuracyLens.observe() first sees the "Exported Float" record: 1. Extract the float model from the ExportedProgram artifact 2. Use captured dataset (from get_imagenet_dataset patch) as primary source, or fall back to sample inputs captured by PipelineGraphCollectorLens 3. Auto-detect task type, post_process, and metrics 4. Compute golden outputs and build the evaluator PipelineGraphCollectorLens now also captures the sample input tuple from torch.export.export(mod, args, ...) as _last_export_inputs, providing a fallback dataset for AccuracyLens when dataset loader patches don't fire (e.g., custom datasets, non-Qualcomm backends). Changes: - accuracy.py: remove _install_build_binary_patch(), remove _captured_model, add _configure_from_float_model(), update observe() with lazy init - pipeline_graph_collector.py: add _last_export_inputs, capture args[1] in patched_export, clear on uninstall/clear - LENSES.md: new reference doc covering all lenses, observation points, patching strategy, accuracy lens lazy configuration, data source fallback strategy, and custom usage examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…screen Five UI improvements to the Observatory HTML report: 1. Auto-hide sidebar (main.css, 02_layout.js) The left index pane is now position:fixed and hidden by default (translateX(-100%)). A 12px invisible trigger strip on the left edge reveals it on hover; the pane itself stays open while hovered. Main content takes the full viewport width — no left margin needed. 2. Auto-hide header (main.css, 02_layout.js) The header is now position:fixed and hidden by default (translateY(-100%)). A 10px invisible trigger strip at the top edge reveals it on hover. Both panels overlay content rather than consuming layout space, maximising the graph viewer area. 3. Theme sync Observatory → fx_viewer (04_actions.js, 03_blocks.js) setTheme() now propagates the theme to all live mountedViewers via viewer.setTheme(theme) and to all mountedCompares by iterating compare.viewers. New viewers receive themeName in their initial state so they open in the current theme without a separate setTheme call. 4. Compare snap merge-on-write (03_blocks.js) Root cause: all viewers in a compare group share one compareStateCache entry. Each viewer registers a statechange listener that overwrites the snap. When viewer B pans after viewer A selects a node, viewer B's statechange fires with selectedNodeId=null and wipes the selection. On the next restore both viewers see null → both zoomToFit(). Fix: the statechange write is now a merge. selectedNodeId is only updated when the incoming value is non-null, preserving any selection set by any viewer until another viewer explicitly selects a different node. camera/activeExtensions/colorBy are always overwritten with the latest value. Restore priority order (unchanged logic, now actually reachable): - node exists in this viewer's graph → selectNode + animate - node not found (different record) or no selection → zoomToFit() - no snapshot at all → init() default positioning 5. Fullscreen button — correct API key (03_blocks.js) Previous implementation passed ui.controls.fullscreenButton which is not a recognised fx_viewer config key. Correct key per RFC_FX_VIEWER_API_INTERFACE.md and fx_graph_viewer.js default (fullscreen: { enabled: true, button: false }) is layout.fullscreen.button. Changed ViewerCtor.create() call to layout: { preset, fullscreen: { button: true } }. README section 12.2 updated to document the merge-on-write behaviour, the corrected restore priority order (no longer uses setState({camera}) as fallback), and the per-viewer node-existence check semantics. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ple stats, worst-index sharing ## What changed ### 1. Table view cleanup Remove (diff) entries from the per-record accuracy table. Diffs already appear in the left panel index via check_index_diffs() — duplicating them in the table added noise without value. ### 2. PSNR cap at 100.0 dB Raw PSNR above 100 dB (e.g. 128 dB for near-zero observer error in the Annotated Model stage) is not meaningfully different from perfect match and produced confusing display. PSNR.MAX_PSNR = 100.0 gives a uniform ceiling: perfect match → 100.0, real quantization degradation → actual dB below 100. ### 3. Redesigned Metric base class Replace the Protocol stub with a proper base class: class Metric: higher_is_better: bool = True # controls worst-case direction def calculate_per_sample(self, predictions) -> List[float]: ... def calculate(self, predictions) -> float: # mean of per-sample def worst_index(self, per_sample) -> int: # argmin or argmax higher_is_better encodes each metric's direction knowledge: True → worst = argmin (PSNR, cosine_sim, TopK — lower is worse) False → worst = argmax (MSE, AbsErr — higher is worse) All existing metrics (PSNR, CosineSimilarity, TopKAccuracy, MaskedTokenAccuracy) are refactored to implement calculate_per_sample() instead of calculate(). ### 4. New metrics: MSE and AbsErr Both demonstrate higher_is_better=False and are added to the default evaluator alongside PSNR and CosineSimilarity whenever golden outputs are available. ### 5. Per-sample statistics in Evaluator.evaluate() When dataset has >1 sample, each metric emits three additional keys: {name}_min — best sample value {name}_max — worst sample value (in the metric's own direction) {name}_worst_idx — dataset index of the worst-performing sample Single-sample datasets emit only the primary mean value (no _min/_max/_worst_idx) to keep the digest clean for the common fallback case. ### 6. Cross-lens worst-index sharing via AccuracyLens._worst_indices AccuracyLens now maintains a class-level dict: _worst_indices: Dict[str, int] # {metric_name: dataset_index} Updated after every evaluate() call, cleared on session end / Observatory.clear(). Future lenses (e.g. per-layer accuracy analysis) read it during their own observe() without re-running inference: from .accuracy import AccuracyLens worst = AccuracyLens._worst_indices.get("psnr") # int or None This follows the same pattern as PipelineGraphCollectorLens._last_export_inputs. AccuracyLens must be registered before any lens that reads _worst_indices. ### 7. Frontend: second table for worst indices _AccuracyFrontend.record() now emits two TableBlocks: - "Accuracy" (order=20): all metric values + min/max, no worst_idx keys - "Worst Input Index (per metric)" (order=21): stripped worst_idx values, only shown when dataset has >1 sample check_index_diffs() extended with "mse" and "abs_err" keys. ### 8. LENSES.md updated Documents: PSNR cap rationale, higher_is_better table, per-sample statistics contract, _worst_indices cross-lens sharing pattern with code example, updated expected metric behavior table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Compare view redesign (FXGraphCompare + FXCompareTaskbar) The previous implementation had three interacting layout bugs: 1. FXCompareTaskbar injected its div as a child of the CSS grid container, making it a grid item and pushing viewer columns into wrong positions. 2. sharedTaskbar.enabled defaulted to true, activating the taskbar on every existing FXGraphCompare.create() call including JS08 and JS99. 3. layout.tiled defaulted to true, restructuring viewer DOM unconditionally. The compare view is now redesigned around a single owned DOM shell: layout.container .fx-compare-root (flex column — created by FXGraphCompare) .fx-compare-taskbar (optional — only when sharedTaskbar.enabled) .fx-compare-grid (CSS grid, repeat(N, 1fr) columns) .fx-compare-col (one per viewer, flex column) .fx-compare-col-header .fx-compare-minimap-row (fixed height — uniform across all cols) viewer.minimapRenderer.container (moved here) .fx-compare-canvas-row (flex:1 — uniform across all cols) viewer.mainArea (moved here) .fx-compare-info-bar (single shared merged info panel) Key design decisions: - FXGraphCompare owns the compare DOM entirely. It moves viewer.mainArea and viewer.minimapRenderer.container into compare columns and hides viewer.wrapper. DOM snapshots (parent + nextSibling) are recorded before any move; destroy() calls _teardownCompareDOM() to restore every element to its original position. - Uniform row heights are guaranteed structurally: all minimap rows are the same fixed height (layout.minimapHeight, default 180px); all canvas rows share flex:1 in the same flex column, so they expand to identical remaining space. No per-column height negotiation needed. - sharedTaskbar.enabled defaults to false (opt-in). Existing callers are unaffected. FXCompareTaskbar prepends to .fx-compare-root, not to the user's container, so it is never a grid item. - Tiled layout is the only compare layout. setTiled() and setCompact() are no-ops kept for backward compatibility. applyTiledLayout() removed from FXGraphViewer. All .fx-tiled CSS removed. - A ResizeObserver on each .fx-compare-canvas-row keeps canvases sized correctly on window resize and column count changes. - Merged info panel (_updateMergedInfo) renders a diff table into .fx-compare-info-bar after selection sync. Rows where values differ across graphs are highlighted amber (.fx-diff). - Selection sync supports three modes: 'none', 'id' (default), 'layer' (match by extensions[layer].nodes[id].info[field] value; topologically last on ties). ## UI fixes (Issues 1–5) Issue 1 — Button icon conflicts: - Zoom to Fit icon changed from ⛶ (U+26F6) to ⤢ (U+2922, NORTH EAST AND SOUTH WEST ARROW) to distinguish it from the fullscreen button. - Clear Selection button removed entirely — redundant (search clears on empty input, canvas click deselects) and its ✖ icon conflicted with fullscreen exit ✕. Issue 2 — Split layout proportions + canvas min-width: - Minimap default height changed from 500px to 240px (was 1:1 with info panel; now ~1:2 ratio, info panel takes remaining flex:1 space). - .fx-main-area gets min-width: 60% so sidebar cannot consume most of the wrapper. - Sidebar width cap in setupResizer() changed from containerRect.width - 200 to containerRect.width * 0.4, enforcing canvas always gets ≥60% of width. Issue 3 — Minimap fails to render in custom div / observatory: - Added ResizeObserver on MinimapRenderer.container. When the container transitions from 0 to non-zero size (collapsed section expanded, custom div appended), calls resize() + generateThumbnail() + render(). Handles all deferred-visibility cases without polling. Issue 4 — Tiled compare mode (see redesign above). Issue 5 — ADV02 info panel height instability in slot mode: - When slots.info is set, FXGraphViewer now applies overflow:hidden and min-height:0 to the slot element, and height:100%/overflow-y:auto to the info panel, preventing grid row expansion on content growth. - ADV02 HTML updated: min-height:0 added to right-column grid container and slot divs. ## Test case updates (harness_testcases.py) - JS08: removed compact/sync-theme controls (now handled by compare API); updated to new FXGraphCompare API (sync.mode:'id'); uses hidden mount divs so FXGraphCompare can build its own DOM shell. - ADV04: new test case demonstrating compare with shared taskbar, sync by ID, and merged info panel. - JS99: updated mount IDs, removed stale c99_sync_theme/c99_compact handlers, theme change now applies to both viewers. ## Documentation - README.md: replaced stale Compare section with full architecture reference including DOM structure, ownership rules, lifecycle, interaction control table, sync modes, and merged info panel description. - templates/README.md: added Compare View DOM and Ownership section with DOM tree, ownership rules, resize handling, and interaction ownership table. - RFC_FX_VIEWER_API_INTERFACE.md: updated Compare API section with new config shape and semantics. - examples/FX_VIEWER_API_TESTCASES.md: added ADV04 test case description. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…reduce roberta/vit data sizes Authored with Claude

… dedup, compare.js split, themes→runtime 5 independently scoped changes: 1. Move verbose JSDoc narrative blocks (USE CASES / ALGORITHM / UX) from 6 JS files to templates/README.md under a new Runtime Internals section. Each class now has a single-line description. ~390 lines removed from JS. 2. Add fxEsc() HTML-escaping helper to runtime.js. Apply it to all innerHTML construction sites in ui_manager.js (rebuildLayersMenu, renderLegend, updateInfoPanel, updateEdgeInfoPanel) and fx_graph_viewer.js (_rebuildLayersMenu, _rebuildSyncSelect, _updateMergedInfo). Search highlight spans are intentionally left as raw HTML with a comment. 3. Add GraphDataStore.computeBoundsForNodes(nodeIds) and two private helpers _collect2HopNeighbors / _collectEdgeNeighbors on ViewerController. Refactor zoomToFit() to use them, eliminating two identical 15-line bounds loops. 4. Split fx_graph_viewer.js: move FXCompareTaskbar + FXGraphCompare to new templates/compare.js (loads after fx_graph_viewer.js). Update ordered_files in exporter.py and generate_api_test_harness.py. 5. Rename themes.js → runtime.js (better reflects content: fxOn/fxOffAll/fxEsc + THEMES). Update ordered_files in both Python files and README. Authored with Claude

…faults

Surfaces the inherent 2-step design (runtime collection → JSON, JSON → HTML) explicitly in the CLI. Adds a `visualize` subcommand that converts an existing JSON to HTML without re-running the export script, and a `--json-only` flag for CI/storage use cases. Fixes export order so JSON is always written before HTML. Adds USAGE.md covering zero-config e2e, two-step workflow, lens config, manual collection points, and demo script modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fixes highlighted nodes (path, selected edge, search candidates, target) to use scale-relative sizes instead of fixed pixel values, so highlights remain visible at all zoom levels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaces the 983-line technical-reference README with a progressive-disclosure document (concept -> workflow -> usage -> extension -> reference links). Moves all contract tables, API references, JS callbacks, and performance notes into a new REFERENCE.md. New README introduces Observatory to new users before diving into architecture details. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move the Observatory debugging framework and fx_viewer graph visualization from backends/qualcomm/ to devtools/ so all ExecuTorch backends can use them. Backend-specific patches (QNN ptq_calibrate, XNNPACK quantize, QNN dataset capture) are extracted into their respective backend directories with a register_backend_patches() hook mechanism. Each backend maintains its own CLI runner (--accuracy opt-in flag): - python -m executorch.devtools.observatory (generic) - python -m executorch.backends.qualcomm.debugger.observatory (QNN) - python -m executorch.backends.xnnpack.debugger.observatory (XNNPACK) Includes RFC document at devtools/observatory/RFC_OBSERVATORY_SHARED_FRAMEWORK.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ture

Add observe_pass, a decorator that wraps any PassBase subclass or callable pass to automatically collect input/output graphs via Observatory. Names are derived from the class/function name and deduplicated by collect() itself (#2, pytorch#3, ...) so repeated calls never silently overwrite records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ayer accuracy Add `default_sync` field to `GraphCompareSpec` so lenses can declare their preferred compare sync strategy. The per-layer accuracy lens now defaults to `mode: "layer"` on `sparse_match_key` instead of the generic auto mode, giving immediate node correspondence when comparing records side-by-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the vendored grandalf fork with the Rust-backed fast-sugiyama package and rewrite the post-layout compaction in exporter.py to produce a natural top-down edge flow. Layout changes in _compute_layout_with_ext_lines / _compact_components: - Drive layout via fast-sugiyama's from_edges + rect_pack_layouts. - Run per-layer spine cohesion (chain detection across real and dummy nodes, iterative mean-pull, pure-A overlap repair) so linear op chains render as straight spines. - Flip Phase 6's y so graph inputs / placeholders land at the top of the canvas and outputs / sinks at the bottom. - Anchor edge endpoints to the bottom-midpoint of the source and top-midpoint of the target, giving each node a predictable dock. Also refresh the README's dependency / layout section to point at fast-sugiyama[all] and drop the grandalf install instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. use chain detection and iterative spine refinement to avoid overlap and restore compactness and alignment 2. chain detection leverage node depth (rank) to detect longest possible chains 3. chains are attracted to each other by sharing start and end nodes, in spine refinement, mean attracition center will be shifted based on end nodes.

…st-aware node text The per-layer accuracy lens was coloring nodes and table cells with a color range recomputed independently per record, so the same cosine/PSNR/MSE/abs_err value mapped to different colors in different records and broke side-by-side comparison. Separately, fx_viewer was drawing node labels in theme.text regardless of node fill, so extension layers (e.g. this lens) that assigned dark or near-white fills produced unreadable text. Changes: devtools/observatory/lenses/per_layer_accuracy.py — `_MetricNumericColorRule` now accepts an optional `fixed_range`; range resolution is factored into `_resolve_range()`. `analyze()` aggregates per-metric (vmin, vmax) across every record's rows once via `_aggregate_metric_ranges()`, stashes them in `AnalysisResult.global_data["metric_ranges"]`, and threads the cosine_sim range into each per-record `_build_metric_extension(...)`. The HTML metrics table is now parameterized by `metric_ranges` and the frontend `record()` reads them from `analysis["global"]`, so table and graph extension share one scale and the 5-point legend auto-reflects the unified range. Per-record fallback is preserved for callers without analysis context. devtools/fx_viewer/templates/runtime.js — new `fxReadableTextColor(hex)` helper using WCAG 2.x relative luminance (0.2126·R + 0.7152·G + 0.0722·B on linearized sRGB). Returns `#111111` or `#f8f8f8` by higher-contrast test, or `null` on malformed input so callers can fall back to theme defaults. Palette matches the lens HTML table's `_text_color_for_bg` for visual consistency. devtools/fx_viewer/templates/canvas_renderer.js — the node draw loop now tracks `renderedFill` through the selected/preview/hovered/input/output shading branches and picks the label ink from `fxReadableTextColor(renderedFill)` whenever the node carries a custom `fill_color` from any extension; nodes without an extension color keep the theme default. No payload/schema changes, so the contrast rule applies retroactively to every existing extension layer. devtools/observatory/tests/test_per_layer_accuracy_lens.py — added `test_analyze_produces_unified_metric_ranges_across_records`: builds two synthetic digests, asserts `global_data["metric_ranges"]` spans the union across records, and asserts that a node with the same metric value in both records receives the same `fill_color` (proving the color no longer depends on which record the node happens to appear in). Review order: the lens module is the load-bearing change — start there, then the matching test, then the two fx_viewer templates which are local and additive. Authored with Claude Code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xamples

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026

quic-boyuc force-pushed the boyuc/observatory_draft_demo branch from 53faf3c to 2f5a882 Compare May 5, 2026 04:33

quic-boyuc and others added 26 commits May 5, 2026 12:34

Add fx_viewer in executorch

2a99352

Figure out plan for integration of observatory with fx_viewer

46be9ce

fx_viewer: introduce state-driven JS runtime API and embed config

1719b6e

fx_viewer/examples: add unified API harness generator and testcases

41babb3

fx_viewer/examples: add per-layer accuracy demo and observatory calls…

55e69f1

…ite fix

fx_viewer/docs: document RFC API surface and runtime architecture

1d67726

fx_viewer: harden viewer lifecycle and compare sync behavior

0337eed

fx_viewer: simplify runtime layer mutation refresh paths

a10c2ee

fx_viewer: align comment and formatting style with existing codebase

7c74bec

fx_viewer: clean docs artifacts and restore merged README guidance

17704ba

fx_viewer: reduce runtime boilerplate with shared listener and UI hel…

9a212c6

…pers

fx_viewer/examples: add tutorial-style API learning ladder and mixed …

f9ecab3

…demo

observatory: scaffold RFC contracts and planning docs

537ecb3

observatory: add core runtime, minimal lenses, and ETRecord auto-coll…

1c53d60

…ect hook

observatory(ui): split report runtime into topic JS template files

4f46ad5

observatory: add demos, UI harness, and smoke tests

6da102c

Integrate fx_graph into observatory

96f61d3

- Use python as clean API and function arguments - Fix bug in html_template.py (\n --> \\n)

fx_viewer: update JS 08 compare testcase with shared taskbar toggle; …

f984d69

…reduce roberta/vit data sizes Authored with Claude

quic-boyuc and others added 25 commits May 5, 2026 12:34

fx_viewer/observatory: adjust minimap layout for usability

2113428

observatory: merge analyze-only graph layers and align graph_color de…

570ada1

…faults

observatory / fx_viewer: optimized coloring and theme

e90e26a

Add per-layer accuracy lens with PSNR-based graph/table UI

b5b7732

Add payload relayout API and observatory graph relayout integration

0e08d07

Add per-metric per-layer accuracy graph layers and docs

aec0a40

observatory: update document and cli options to feat new module struc…

8e6be76

…ture

Bug Fixes

23b566e

Fix json float nan issue, simplify cli, prepare rfc draft

5f62daf

Fix typo for --lens_recipe option, update readme and example

a04dae4

Deleted out-dated RFC and test files

6517525

Update readme and cli experience, remove previous edits on existing e…

5f25b92

…xamples

Remove intermediate md files

f59a113

Handle both XNNPACK quantization import paths

f3a085b

Adjust graph block size in Observatory template

f02dd78

Add backend specific readme.md

6a3afad

quic-boyuc force-pushed the boyuc/observatory_draft_demo branch from 2f5a882 to 6a3afad Compare May 5, 2026 04:34

quic-boyuc added 4 commits May 5, 2026 17:51

Fix Observatory stack trace links

0936bca

observatory: fix qnn accuracy lens hook

ecaefd2

Fix compare graph layout refresh

d29ad3c

Improve compare info table readability

1dd5421

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boyuc/observatory draft demo#19288

Boyuc/observatory draft demo#19288
quic-boyuc wants to merge 65 commits intopytorch:mainfrom
CodeLinaro:boyuc/observatory_draft_demo

quic-boyuc commented May 5, 2026

Uh oh!

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quic-boyuc commented May 5, 2026

[RFC Draft] Observatory: a unified debugging framework for ExecuTorch

Summary

Relationship to the RFC

What's in this PR (current demo scope)

What's in the RFC but NOT in this PR

How to try it

Review guidance

Test plan

Uh oh!

pytorch-bot Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19288

❗ 1 Active SEVs

⚠️ 11 Awaiting Approval

Uh oh!

github-actions Bot commented May 5, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 5, 2026 •

edited

Loading

This PR needs a `release notes:` label