Boyuc/observatory draft demo#19288
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19288
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below:
|
This PR needs a
|
53faf3c to
2f5a882
Compare
- Use python as clean API and function arguments - Fix bug in html_template.py (\n --> \\n)
…otstrap Three fixes for HTML report correctness and size: 1. Base64-encode HtmlBlock.content in the JSON payload so </script> and other special characters cannot corrupt the outer <script> tag. The JS runtime decodes with atob() before innerHTML assignment (03_blocks.js, renderHtmlCompare). 2. Gzip+base64 compress the full JSON payload when it exceeds 8 KB (observatory.py _compress_payload). The browser decompresses via DecompressionStream inside an async IIFE, which also moves the Observatory runtime execution to after the FX viewer bundle is injected — fixing the "FXGraphViewer unavailable" race condition that existed when Script 3 ran before Script 2 finished awaiting (html_template.py). 3. Base64-encode resources.js[] entries in generate_ui_test_harness.py so the test harness goes through the same pipeline as production reports instead of bypassing it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs fixed:
1. Wrong initial camera on first load — viewer.init() was called
synchronously before the browser had laid out the container, so
getBoundingClientRect() returned {width:0, height:0} and the camera
was placed far off-screen. Fixed by deferring init() to the next
animation frame (requestAnimationFrame).
2. All viewer state lost on every record switch — destroyGraphRuntime()
was unconditionally destroying every live viewer. Camera, selected
node, active layers, colorBy, and zoom were all reset.
Fix: hybrid viewer cache.
Single-record graph blocks use a live DOM cache keyed by
(recordIndex, lensName, blockId). On navigate-away the wrapper is
detached from the DOM but the viewer stays alive in state.viewerCache.
On return the wrapper is re-appended and a resize rAF is queued — no
re-init, no re-layout, full state preserved. LRU eviction at 10 viewers.
Compare-mode viewers are always freshly created (keeping N side-by-side
viewers alive would multiply memory cost). Instead a lightweight state
snapshot {camera, selectedNodeId, activeExtensions, colorBy} is saved
to state.compareStateCache on every statechange event. On re-entry each
new viewer is seeded from the snapshot: selectNode+animate if a node was
selected, setState({camera}) otherwise.
README section 12 documents the memory budget and trade-off rationale.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…acy lenses
Introduces a `python -m executorch.backends.qualcomm.debugger.observatory.cli`
runner that wraps any standard Qualcomm example script (e.g. swin_v2_t.py)
to automatically enable full Observatory debugging with no script modifications.
Key changes:
PipelineGraphCollectorLens (new):
- Absorbs ETRecordAutoCollector from the deleted auto_collect.py, moving all
monkey-patching out of the Observatory core framework and into a lens.
- Patches framework-level functions (torch.export.export, prepare_pt2e,
convert_pt2e, to_edge_transform_and_lower, ETRecord.add_*) to auto-collect
graph snapshots at each compilation stage: Exported Float, Annotated Model,
Calibrated Model, Quantized Model, Edge, Transformed Edge, ETRecord records.
- Forces generate_etrecord=True in to_edge_transform_and_lower to ensure
ETRecord collection fires automatically.
- Framework-level patches work for all backends (QNN, XNNPack, CoreML, etc.).
AccuracyLens (new):
- Ports AccuracyEvaluationLens from legacy debugging_utils to new Observatory
interfaces (ViewList/TableBlock).
- Adds MaskedTokenAccuracy metric and MLMEvaluator for masked language model
scripts (bert, roberta, distilbert, albert, eurobert).
- Auto-patches get_imagenet_dataset and get_masked_language_model_dataset to
capture targets, covering 24 of ~30 standard oss_scripts.
- Auto-detects task type (classification vs MLM), post_process function, and
default metrics from model output format.
CLI runner (new):
- cli.py + __main__.py: parse observatory flags, register lenses, wrap target
script in Observatory.enable_context(), run via runpy.run_path(), generate
HTML + JSON report to {artifact}/observatory_report.{html,json}.
- Flags: --no-accuracy, --no-report, --report-title.
observatory.py:
- Remove ETRecordAutoCollector.install/uninstall calls; patching is now
delegated to PipelineGraphCollectorLens when registered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…st "Exported Float" record AccuracyLens was not producing metrics in HTML reports because the evaluator was configured in a build_executorch_binary POST-hook, but all Observatory.collect() calls happen DURING that function — before the evaluator existed. The fix removes the build_executorch_binary patch entirely and instead configures the evaluator lazily when AccuracyLens.observe() first sees the "Exported Float" record: 1. Extract the float model from the ExportedProgram artifact 2. Use captured dataset (from get_imagenet_dataset patch) as primary source, or fall back to sample inputs captured by PipelineGraphCollectorLens 3. Auto-detect task type, post_process, and metrics 4. Compute golden outputs and build the evaluator PipelineGraphCollectorLens now also captures the sample input tuple from torch.export.export(mod, args, ...) as _last_export_inputs, providing a fallback dataset for AccuracyLens when dataset loader patches don't fire (e.g., custom datasets, non-Qualcomm backends). Changes: - accuracy.py: remove _install_build_binary_patch(), remove _captured_model, add _configure_from_float_model(), update observe() with lazy init - pipeline_graph_collector.py: add _last_export_inputs, capture args[1] in patched_export, clear on uninstall/clear - LENSES.md: new reference doc covering all lenses, observation points, patching strategy, accuracy lens lazy configuration, data source fallback strategy, and custom usage examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…screen
Five UI improvements to the Observatory HTML report:
1. Auto-hide sidebar (main.css, 02_layout.js)
The left index pane is now position:fixed and hidden by default
(translateX(-100%)). A 12px invisible trigger strip on the left edge
reveals it on hover; the pane itself stays open while hovered. Main
content takes the full viewport width — no left margin needed.
2. Auto-hide header (main.css, 02_layout.js)
The header is now position:fixed and hidden by default
(translateY(-100%)). A 10px invisible trigger strip at the top edge
reveals it on hover. Both panels overlay content rather than consuming
layout space, maximising the graph viewer area.
3. Theme sync Observatory → fx_viewer (04_actions.js, 03_blocks.js)
setTheme() now propagates the theme to all live mountedViewers via
viewer.setTheme(theme) and to all mountedCompares by iterating
compare.viewers. New viewers receive themeName in their initial state
so they open in the current theme without a separate setTheme call.
4. Compare snap merge-on-write (03_blocks.js)
Root cause: all viewers in a compare group share one compareStateCache
entry. Each viewer registers a statechange listener that overwrites the
snap. When viewer B pans after viewer A selects a node, viewer B's
statechange fires with selectedNodeId=null and wipes the selection.
On the next restore both viewers see null → both zoomToFit().
Fix: the statechange write is now a merge. selectedNodeId is only
updated when the incoming value is non-null, preserving any selection
set by any viewer until another viewer explicitly selects a different
node. camera/activeExtensions/colorBy are always overwritten with the
latest value.
Restore priority order (unchanged logic, now actually reachable):
- node exists in this viewer's graph → selectNode + animate
- node not found (different record) or no selection → zoomToFit()
- no snapshot at all → init() default positioning
5. Fullscreen button — correct API key (03_blocks.js)
Previous implementation passed ui.controls.fullscreenButton which is
not a recognised fx_viewer config key. Correct key per
RFC_FX_VIEWER_API_INTERFACE.md and fx_graph_viewer.js default
(fullscreen: { enabled: true, button: false }) is
layout.fullscreen.button. Changed ViewerCtor.create() call to
layout: { preset, fullscreen: { button: true } }.
README section 12.2 updated to document the merge-on-write behaviour,
the corrected restore priority order (no longer uses setState({camera})
as fallback), and the per-viewer node-existence check semantics.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ple stats, worst-index sharing
## What changed
### 1. Table view cleanup
Remove (diff) entries from the per-record accuracy table. Diffs already appear
in the left panel index via check_index_diffs() — duplicating them in the table
added noise without value.
### 2. PSNR cap at 100.0 dB
Raw PSNR above 100 dB (e.g. 128 dB for near-zero observer error in the
Annotated Model stage) is not meaningfully different from perfect match and
produced confusing display. PSNR.MAX_PSNR = 100.0 gives a uniform ceiling:
perfect match → 100.0, real quantization degradation → actual dB below 100.
### 3. Redesigned Metric base class
Replace the Protocol stub with a proper base class:
class Metric:
higher_is_better: bool = True # controls worst-case direction
def calculate_per_sample(self, predictions) -> List[float]: ...
def calculate(self, predictions) -> float: # mean of per-sample
def worst_index(self, per_sample) -> int: # argmin or argmax
higher_is_better encodes each metric's direction knowledge:
True → worst = argmin (PSNR, cosine_sim, TopK — lower is worse)
False → worst = argmax (MSE, AbsErr — higher is worse)
All existing metrics (PSNR, CosineSimilarity, TopKAccuracy, MaskedTokenAccuracy)
are refactored to implement calculate_per_sample() instead of calculate().
### 4. New metrics: MSE and AbsErr
Both demonstrate higher_is_better=False and are added to the default evaluator
alongside PSNR and CosineSimilarity whenever golden outputs are available.
### 5. Per-sample statistics in Evaluator.evaluate()
When dataset has >1 sample, each metric emits three additional keys:
{name}_min — best sample value
{name}_max — worst sample value (in the metric's own direction)
{name}_worst_idx — dataset index of the worst-performing sample
Single-sample datasets emit only the primary mean value (no _min/_max/_worst_idx)
to keep the digest clean for the common fallback case.
### 6. Cross-lens worst-index sharing via AccuracyLens._worst_indices
AccuracyLens now maintains a class-level dict:
_worst_indices: Dict[str, int] # {metric_name: dataset_index}
Updated after every evaluate() call, cleared on session end / Observatory.clear().
Future lenses (e.g. per-layer accuracy analysis) read it during their own
observe() without re-running inference:
from .accuracy import AccuracyLens
worst = AccuracyLens._worst_indices.get("psnr") # int or None
This follows the same pattern as PipelineGraphCollectorLens._last_export_inputs.
AccuracyLens must be registered before any lens that reads _worst_indices.
### 7. Frontend: second table for worst indices
_AccuracyFrontend.record() now emits two TableBlocks:
- "Accuracy" (order=20): all metric values + min/max, no worst_idx keys
- "Worst Input Index (per metric)" (order=21): stripped worst_idx values,
only shown when dataset has >1 sample
check_index_diffs() extended with "mse" and "abs_err" keys.
### 8. LENSES.md updated
Documents: PSNR cap rationale, higher_is_better table, per-sample statistics
contract, _worst_indices cross-lens sharing pattern with code example, updated
expected metric behavior table.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Compare view redesign (FXGraphCompare + FXCompareTaskbar)
The previous implementation had three interacting layout bugs:
1. FXCompareTaskbar injected its div as a child of the CSS grid container,
making it a grid item and pushing viewer columns into wrong positions.
2. sharedTaskbar.enabled defaulted to true, activating the taskbar on every
existing FXGraphCompare.create() call including JS08 and JS99.
3. layout.tiled defaulted to true, restructuring viewer DOM unconditionally.
The compare view is now redesigned around a single owned DOM shell:
layout.container
.fx-compare-root (flex column — created by FXGraphCompare)
.fx-compare-taskbar (optional — only when sharedTaskbar.enabled)
.fx-compare-grid (CSS grid, repeat(N, 1fr) columns)
.fx-compare-col (one per viewer, flex column)
.fx-compare-col-header
.fx-compare-minimap-row (fixed height — uniform across all cols)
viewer.minimapRenderer.container (moved here)
.fx-compare-canvas-row (flex:1 — uniform across all cols)
viewer.mainArea (moved here)
.fx-compare-info-bar (single shared merged info panel)
Key design decisions:
- FXGraphCompare owns the compare DOM entirely. It moves viewer.mainArea and
viewer.minimapRenderer.container into compare columns and hides viewer.wrapper.
DOM snapshots (parent + nextSibling) are recorded before any move; destroy()
calls _teardownCompareDOM() to restore every element to its original position.
- Uniform row heights are guaranteed structurally: all minimap rows are the same
fixed height (layout.minimapHeight, default 180px); all canvas rows share
flex:1 in the same flex column, so they expand to identical remaining space.
No per-column height negotiation needed.
- sharedTaskbar.enabled defaults to false (opt-in). Existing callers are
unaffected. FXCompareTaskbar prepends to .fx-compare-root, not to the user's
container, so it is never a grid item.
- Tiled layout is the only compare layout. setTiled() and setCompact() are
no-ops kept for backward compatibility. applyTiledLayout() removed from
FXGraphViewer. All .fx-tiled CSS removed.
- A ResizeObserver on each .fx-compare-canvas-row keeps canvases sized
correctly on window resize and column count changes.
- Merged info panel (_updateMergedInfo) renders a diff table into
.fx-compare-info-bar after selection sync. Rows where values differ across
graphs are highlighted amber (.fx-diff).
- Selection sync supports three modes: 'none', 'id' (default), 'layer'
(match by extensions[layer].nodes[id].info[field] value; topologically last
on ties).
## UI fixes (Issues 1–5)
Issue 1 — Button icon conflicts:
- Zoom to Fit icon changed from ⛶ (U+26F6) to ⤢ (U+2922, NORTH EAST AND
SOUTH WEST ARROW) to distinguish it from the fullscreen button.
- Clear Selection button removed entirely — redundant (search clears on empty
input, canvas click deselects) and its ✖ icon conflicted with fullscreen
exit ✕.
Issue 2 — Split layout proportions + canvas min-width:
- Minimap default height changed from 500px to 240px (was 1:1 with info panel;
now ~1:2 ratio, info panel takes remaining flex:1 space).
- .fx-main-area gets min-width: 60% so sidebar cannot consume most of the
wrapper.
- Sidebar width cap in setupResizer() changed from containerRect.width - 200
to containerRect.width * 0.4, enforcing canvas always gets ≥60% of width.
Issue 3 — Minimap fails to render in custom div / observatory:
- Added ResizeObserver on MinimapRenderer.container. When the container
transitions from 0 to non-zero size (collapsed section expanded, custom div
appended), calls resize() + generateThumbnail() + render(). Handles all
deferred-visibility cases without polling.
Issue 4 — Tiled compare mode (see redesign above).
Issue 5 — ADV02 info panel height instability in slot mode:
- When slots.info is set, FXGraphViewer now applies overflow:hidden and
min-height:0 to the slot element, and height:100%/overflow-y:auto to the
info panel, preventing grid row expansion on content growth.
- ADV02 HTML updated: min-height:0 added to right-column grid container and
slot divs.
## Test case updates (harness_testcases.py)
- JS08: removed compact/sync-theme controls (now handled by compare API);
updated to new FXGraphCompare API (sync.mode:'id'); uses hidden mount divs
so FXGraphCompare can build its own DOM shell.
- ADV04: new test case demonstrating compare with shared taskbar, sync by ID,
and merged info panel.
- JS99: updated mount IDs, removed stale c99_sync_theme/c99_compact handlers,
theme change now applies to both viewers.
## Documentation
- README.md: replaced stale Compare section with full architecture reference
including DOM structure, ownership rules, lifecycle, interaction control
table, sync modes, and merged info panel description.
- templates/README.md: added Compare View DOM and Ownership section with DOM
tree, ownership rules, resize handling, and interaction ownership table.
- RFC_FX_VIEWER_API_INTERFACE.md: updated Compare API section with new config
shape and semantics.
- examples/FX_VIEWER_API_TESTCASES.md: added ADV04 test case description.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…reduce roberta/vit data sizes Authored with Claude
… dedup, compare.js split, themes→runtime 5 independently scoped changes: 1. Move verbose JSDoc narrative blocks (USE CASES / ALGORITHM / UX) from 6 JS files to templates/README.md under a new Runtime Internals section. Each class now has a single-line description. ~390 lines removed from JS. 2. Add fxEsc() HTML-escaping helper to runtime.js. Apply it to all innerHTML construction sites in ui_manager.js (rebuildLayersMenu, renderLegend, updateInfoPanel, updateEdgeInfoPanel) and fx_graph_viewer.js (_rebuildLayersMenu, _rebuildSyncSelect, _updateMergedInfo). Search highlight spans are intentionally left as raw HTML with a comment. 3. Add GraphDataStore.computeBoundsForNodes(nodeIds) and two private helpers _collect2HopNeighbors / _collectEdgeNeighbors on ViewerController. Refactor zoomToFit() to use them, eliminating two identical 15-line bounds loops. 4. Split fx_graph_viewer.js: move FXCompareTaskbar + FXGraphCompare to new templates/compare.js (loads after fx_graph_viewer.js). Update ordered_files in exporter.py and generate_api_test_harness.py. 5. Rename themes.js → runtime.js (better reflects content: fxOn/fxOffAll/fxEsc + THEMES). Update ordered_files in both Python files and README. Authored with Claude
Surfaces the inherent 2-step design (runtime collection → JSON, JSON → HTML) explicitly in the CLI. Adds a `visualize` subcommand that converts an existing JSON to HTML without re-running the export script, and a `--json-only` flag for CI/storage use cases. Fixes export order so JSON is always written before HTML. Adds USAGE.md covering zero-config e2e, two-step workflow, lens config, manual collection points, and demo script modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes highlighted nodes (path, selected edge, search candidates, target) to use scale-relative sizes instead of fixed pixel values, so highlights remain visible at all zoom levels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the 983-line technical-reference README with a progressive-disclosure document (concept -> workflow -> usage -> extension -> reference links). Moves all contract tables, API references, JS callbacks, and performance notes into a new REFERENCE.md. New README introduces Observatory to new users before diving into architecture details. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move the Observatory debugging framework and fx_viewer graph visualization from backends/qualcomm/ to devtools/ so all ExecuTorch backends can use them. Backend-specific patches (QNN ptq_calibrate, XNNPACK quantize, QNN dataset capture) are extracted into their respective backend directories with a register_backend_patches() hook mechanism. Each backend maintains its own CLI runner (--accuracy opt-in flag): - python -m executorch.devtools.observatory (generic) - python -m executorch.backends.qualcomm.debugger.observatory (QNN) - python -m executorch.backends.xnnpack.debugger.observatory (XNNPACK) Includes RFC document at devtools/observatory/RFC_OBSERVATORY_SHARED_FRAMEWORK.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add observe_pass, a decorator that wraps any PassBase subclass or callable pass to automatically collect input/output graphs via Observatory. Names are derived from the class/function name and deduplicated by collect() itself (#2, pytorch#3, ...) so repeated calls never silently overwrite records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ayer accuracy Add `default_sync` field to `GraphCompareSpec` so lenses can declare their preferred compare sync strategy. The per-layer accuracy lens now defaults to `mode: "layer"` on `sparse_match_key` instead of the generic auto mode, giving immediate node correspondence when comparing records side-by-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the vendored grandalf fork with the Rust-backed fast-sugiyama package and rewrite the post-layout compaction in exporter.py to produce a natural top-down edge flow. Layout changes in _compute_layout_with_ext_lines / _compact_components: - Drive layout via fast-sugiyama's from_edges + rect_pack_layouts. - Run per-layer spine cohesion (chain detection across real and dummy nodes, iterative mean-pull, pure-A overlap repair) so linear op chains render as straight spines. - Flip Phase 6's y so graph inputs / placeholders land at the top of the canvas and outputs / sinks at the bottom. - Anchor edge endpoints to the bottom-midpoint of the source and top-midpoint of the target, giving each node a predictable dock. Also refresh the README's dependency / layout section to point at fast-sugiyama[all] and drop the grandalf install instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. use chain detection and iterative spine refinement to avoid overlap and restore compactness and alignment 2. chain detection leverage node depth (rank) to detect longest possible chains 3. chains are attracted to each other by sharing start and end nodes, in spine refinement, mean attracition center will be shifted based on end nodes.
…st-aware node text The per-layer accuracy lens was coloring nodes and table cells with a color range recomputed independently per record, so the same cosine/PSNR/MSE/abs_err value mapped to different colors in different records and broke side-by-side comparison. Separately, fx_viewer was drawing node labels in theme.text regardless of node fill, so extension layers (e.g. this lens) that assigned dark or near-white fills produced unreadable text. Changes: devtools/observatory/lenses/per_layer_accuracy.py — `_MetricNumericColorRule` now accepts an optional `fixed_range`; range resolution is factored into `_resolve_range()`. `analyze()` aggregates per-metric (vmin, vmax) across every record's rows once via `_aggregate_metric_ranges()`, stashes them in `AnalysisResult.global_data["metric_ranges"]`, and threads the cosine_sim range into each per-record `_build_metric_extension(...)`. The HTML metrics table is now parameterized by `metric_ranges` and the frontend `record()` reads them from `analysis["global"]`, so table and graph extension share one scale and the 5-point legend auto-reflects the unified range. Per-record fallback is preserved for callers without analysis context. devtools/fx_viewer/templates/runtime.js — new `fxReadableTextColor(hex)` helper using WCAG 2.x relative luminance (0.2126·R + 0.7152·G + 0.0722·B on linearized sRGB). Returns `#111111` or `#f8f8f8` by higher-contrast test, or `null` on malformed input so callers can fall back to theme defaults. Palette matches the lens HTML table's `_text_color_for_bg` for visual consistency. devtools/fx_viewer/templates/canvas_renderer.js — the node draw loop now tracks `renderedFill` through the selected/preview/hovered/input/output shading branches and picks the label ink from `fxReadableTextColor(renderedFill)` whenever the node carries a custom `fill_color` from any extension; nodes without an extension color keep the theme default. No payload/schema changes, so the contrast rule applies retroactively to every existing extension layer. devtools/observatory/tests/test_per_layer_accuracy_lens.py — added `test_analyze_produces_unified_metric_ranges_across_records`: builds two synthetic digests, asserts `global_data["metric_ranges"]` spans the union across records, and asserts that a node with the same metric value in both records receives the same `fill_color` (proving the color no longer depends on which record the node happens to appear in). Review order: the lens module is the load-bearing change — start there, then the matching test, then the two fx_viewer templates which are local and additive. Authored with Claude Code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2f5a882 to
6a3afad
Compare
[RFC Draft] Observatory: a unified debugging framework for ExecuTorch
RFC: rfc_concise.md · reference.md
This is the reference POC accompanying the Observatory RFC — draft PR opened so reviewers can read the RFC and run the code side by side.
Summary
Observatory turns per-backend debugging scripts into one shared flow: each backend contributes lenses (Python extensions that implement debugging logic end-to-end — capture, analyze, render); the framework handles the session lifecycle, report assembly, and the outputs.
fx_vieweris a standalone, dependency-free FX-graph renderer used as Observatory's graph view, also usable by anyone with atorch.fxgraph.Design rationale, architecture, and extension patterns are in the RFC. This PR is what the design looks like when you build it.
Relationship to the RFC
json_frontend,--compare) and features (Analyzed Report JSON, cross-time regression) as first-class design elements even where the code hasn't caught up yet. RFC §8 is the canonical reference for what's implemented vs proposed.What's in this PR (current demo scope)
devtools/observatory/observatory.py— session lifecycle, nested config stack, capture store, report assemblyinterfaces.py— Lens protocol, typed frontend block contractsgraph_hub.py— base graph + analyze-phase overlay merge; fx_viewer bridgecli.py— generic CLI withcollectandvisualizemodesobserve_pass.py—@observe_passdecoratordevtools/fx_viewer/— FX extraction, Sugiyama layout, extension-layer API, canvas-based JS runtimedevtools/observatory/lenses/:graph,metadata,accuracy,per_layer_accuracy,stack_trace,pipeline_graph_collector,graph_colorbackends/qualcomm/debugger/observatory/backends/xnnpack/debugger/observatory/Observatory.enable_context(...)context manager,@observe_passdecorator, directObservatory.collect(name, artifact)export_json+visualizereload path)What's in the RFC but NOT in this PR
See RFC §8 for the full list with rationale. Headliners:
json_frontend+ Analyzed Report (JSON) — the second frontend hook for LLM triage, CI analytics, and dashboards.--compareCLI mode — cross-time regression over archived Raw Captures.qnn_intermediate_debugger.py..ptediff, size analysis, ETDump-fed runtime lenses, ADB capture.fx_viewerexporters.--compareflow.fx_viewerfor streaming event use cases.Each item has a natural landing point in the Lens protocol or CLI; no breaking changes to what ships here.
How to try it
Pre-generated HTML reports on a matrix of models are linked from the demo index (see RFC §3).
Review guidance
Suggested reading order for reviewers:
RFC §1–§4 — problem, demo, and architecture in one sitting.
devtools/observatory/interfaces.py — the Lens protocol as actually coded.
devtools/observatory/observatory.py — how the Core drives the protocol.
A shipped lens (suggested: lenses/per_layer_accuracy.py) — the protocol exercised end-to-end.
One backend contribution (backends/qualcomm/debugger/observatory/) — what a backend plugs in.
On a first pass, the generated JS/CSS under devtools/fx_viewer/templates/ can be treated as an implementation detail — the Python surface is where the design lives.
Test plan
Unit tests under devtools/observatory/tests/ pass
End-to-end XNNPACK MV2 run produces expected HTML report
End-to-end Qualcomm MobileViT v2 run produces expected HTML report
visualize reload produces HTML equivalent to the original run
Lint / typecheck
Pre-generated reports on model matrix spot-checked manually