feat: chart generation throughput#696
Conversation
Replace sticky progress bars with a bounded ANSI/asciichart throughput panel that plots records per second per generation column. Default progress_bar to enabled and add a local demo config plus screenshot for PR review. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
MkDocs preview: https://bdf16b4e.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-696.docs.buildwithfern.com/nemo/datadesigner
|
Code Review: PR #696 — feat: chart generation throughputSummaryReplaces the existing sticky fill-bar progress display with a bounded ANSI throughput chart panel rendered via
FindingsCorrectness / Behavior
Design / Conventions
Tests
Performance
Security
VerdictApprove with minor changes. The rewrite is well-scoped, the rendering math is reasonable, and tests cover the new surface adequately. Before merge I'd ask for:
The remaining points (constant naming, label-escape hardening, lost wrap-test coverage) are nice-to-haves rather than blockers. |
Greptile SummaryThis PR replaces the old sticky ANSI progress bars with a bounded throughput panel (
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/progress/terminal/throughput_panel.py | New 873-line TUI panel: ANSI chart via asciichartpy, per-column progress bars, model-alias table, feedback marker overlay, throttled redraws (0.75s min interval), handler wrapping for log interleaving, and TTY fallback. Logic is well-structured; exit intentionally leaves the drawn panel visible. |
| packages/data-designer-engine/src/data_designer/engine/progress/reporter.py | AsyncProgressReporter correctly throttles bar redraws (0.75s), filters events by run_id, and closes subscriptions in a finally block. No issues found. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Single-panel async progress wiring, run_id correlation, and reporter.close() in finally are all correct. Adds KeyboardInterrupt to request_cancel bridging and current_generation_column/cancel_event context propagation. |
| packages/data-designer-config/src/data_designer/config/run_config.py | Adds display_tui (default True), progress_bar deprecated shim with DeprecationWarning on read/write/constructor use. Model validator handles backward compatibility via setdefault. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Switches from StickyProgressBar to TerminalThroughputPanel and adds KeyboardInterrupt-aware future awaiting. Sync path creates one panel per column batch, leaving multiple panels on screen for multi-column runs; model-alias table will also be empty on the sync path. |
| packages/data-designer-engine/src/data_designer/engine/models/usage_events.py | New module-level pub/sub bus for token usage events; thread-safe with proper unsubscribe pattern, exception isolation per callback. |
| packages/data-designer-engine/src/data_designer/engine/observability.py | Adds run_id to RuntimeCorrelation and a parallel pub/sub bus for request admission events. Same thread-safe pattern as usage_events.py. |
| packages/data-designer/src/data_designer/cli/commands/create.py | Adds --tui/--no-tui optional flag (default None = defer to RunConfig). Clean integration. |
| packages/data-designer-engine/src/data_designer/engine/progress/tracker.py | ProgressTracker refactored to new progress package; double-lock pattern in _record_completion is sound (interval advancement is atomic in first lock, preventing duplicate log emissions). |
Sequence Diagram
sequenceDiagram
participant CLI as CLI create
participant GC as GenerationController
participant DB as DatasetBuilder
participant AS as AsyncTaskScheduler
participant TTP as TerminalThroughputPanel
participant APR as AsyncProgressReporter
participant MF as ModelFacade
participant UE as usage_events bus
CLI->>GC: "run_create(tui=override)"
GC->>DB: create(...)
DB->>AS: "__init__(display_tui=True, run_id=...)"
AS->>TTP: TerminalThroughputPanel()
AS->>APR: AsyncProgressReporter(trackers, bar, run_id)
APR->>UE: subscribe_token_usage(_record_token_usage)
DB->>AS: run()
AS->>TTP: __enter__() [activates, wraps log handlers]
loop per async task
AS->>MF: acompletion(...)
MF->>UE: emit_token_usage_event(event, run_id)
UE->>APR: _record_token_usage(event)
APR->>TTP: record_model_usage(...)
AS->>APR: record_success(column)
APR->>APR: _maybe_report() [throttled 0.75s]
APR->>TTP: update_many(...)
end
AS->>APR: log_final() then close() [unsubscribes]
AS->>TTP: __exit__() [panel stays visible]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:1453-1458
**Sync-path creates a new panel per column, leaving all panels on screen**
Each call to `_fan_out_with_async` / `_fan_out_with_threads` constructs a fresh `TerminalThroughputPanel` and enters its context manager. Because `TerminalThroughputPanel.__exit__` intentionally does not clear its drawn lines (it leaves the final state as a persistent summary), every column batch in a sync-engine run adds ~22 lines of chart to the terminal before drawing the next one. A dataset with five LLM columns will leave five stacked panels on screen. The async engine avoids this because `AsyncTaskScheduler` creates a single panel for all columns. Consider clearing the panel in `__exit__` on the sync path, or constructing one shared panel above these call sites that is reused across columns.
### Issue 2 of 2
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:1453-1460
**Model-alias table always empty in sync-engine runs**
`ModelFacade._track_usage` calls `emit_token_usage_event` on every completion, but that event bus is only subscribed by `AsyncProgressReporter.__init__`, which is never constructed in the sync path. `TerminalThroughputPanel` here receives record-count updates via `ProgressTracker.update`, but no subscriber calls `record_model_usage`, so the model-alias table in the panel will remain blank for all sync-engine generations. The async path renders the table correctly. This is worth a note in the PR for clarity if the empty model table is intentional for the sync path.
Reviews (3): Last reviewed commit: "test: make throughput redraw assertion l..." | Re-trigger Greptile
Throttle active TTY redraws, sample rates over larger windows, smooth and fit chart series, bound rate history, and harden panel tests. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Emit synthetic token usage in the credential-free progress panel demo so the live token-rate columns are visible, and accept output-only provider usage as a progress event. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Replace the pipe-delimited progress legend with a native spaced layout, keep column alignment via computed widths, mute the header row, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Show raw column names in the async progress panel legend because the table header already provides the column context, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Label the progress legend's now and average record-rate columns as rec/s and refresh the screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
…hput-panel # Conflicts: # packages/data-designer-config/tests/config/test_run_config.py # uv.lock
Code Review — PR #696: feat: chart generation throughputSummaryThis PR replaces the old Footprint: 32 files, +2188/-668. Engine, config, and CLI packages all touched. Layering direction (interface → engine → config) is preserved. FindingsBehaviour / correctness
Style / conventions
TestingCoverage is strong:
Gaps / nice-to-haves:
Performance
SecurityNothing new exposed. ANSI sanitisation in VerdictLooks good to merge with minor follow-ups:
Architectural direction is sound: the new |
Summary
This PR replaces the old fill/disappear terminal progress bars with a bounded ANSI/asciichart throughput panel that is enabled by default through the canonical
RunConfig.display_tui=Truesetting.RunConfig.progress_barremains supported as a deprecated compatibility shim with a warning, and the CLI exposesdata-designer create --tui / --no-tuias a per-run override.Usage
Default CLI behavior
The TUI is enabled by default for
createruns when the process is attached to a terminal. In non-TTY environments, such as redirected output or many notebook contexts, the renderer falls back instead of trying to draw the live panel.Force the TUI for one CLI run
Use
--tuiwhen the saved or programmatic run config disables the panel but this terminal run should show it.Disable the TUI for one CLI run
Use
--no-tuifor logs-only runs, CI jobs, captured terminal output, or any case where the live panel would be noisy.Configure from Python
display_tuiis the canonical run config option. It defaults toTrue; set it explicitly when creating datasets from a Python program.Disable from Python
Deprecated compatibility shim
Existing callers that still set
progress_barget the same behavior throughdisplay_tui, plus aDeprecationWarning.Changes
Added
data_designer.engine.progress.trackerforProgressTrackerdata_designer.engine.progress.reporterforAsyncProgressReporterdata_designer.engine.progress.terminal.throughput_panelforTerminalThroughputPanelRunConfig.display_tui, defaulting toTrue, as the canonical run config switch for the terminal throughput panel.RunConfig.progress_barcompatibility shim that maps todisplay_tuiand emitsDeprecationWarningon constructor use, property reads, and property writes.asciichartpyas an engine dependency for terminal chart rendering.rpm,in tok/s,out tok/s, and red feedback markers.data-designer create --tui / --no-tuito force the create command's TUI setting for a single CLI run.Changed
run_config.display_tui; production code no longer readsrun_config.progress_bar.now rec/s,avg rec/s, and completion progress; token rates are shown in the model-alias table.AsyncTaskScheduleracceptsdisplay_tuifor async panel setup and passes its run id intoAsyncProgressReporter, so global token/admission events from other runs do not pollute the active panel.packages/data-designer-engine/tests/engine/progress/.Removed
data_designer.engine.dataset_builders.utils..scratch/only.Attention Areas
throughput_panel.py- terminal rendering, chart/table layout, redraw throttling, feedback marker projection, and TTY fallback behavior.reporter.py- progress aggregation, token/admission event subscriptions, cleanup, and run-id filtering.async_scheduler.py- async progress setup, run correlation, cancellation behavior, anddisplay_tuiplumbing.create.py- CLI--tui / --no-tuioverride surface.run_config.py- canonicaldisplay_tuifield plus deprecatedprogress_barshim behavior.pyproject.toml- adds theasciichartpyruntime dependency.Validation
Focused config, engine progress/scheduler, CLI, and interface tests were run after merging the latest
origin/main; the branch is pushed and up to date.Description updated with AI