Skip to content

Publish per-target diagnostics artifact for us-data parity comparisons #133

@PavelMakarchuk

Description

@PavelMakarchuk

Problem

The calibration diagnostics dashboard can currently consume the public microplex-us artifacts under artifacts/, but those files only expose summary-level parity/regression information:

  • pe_us_data_rebuild_parity.json
  • live_pe_us_data_rebuild_checkpoint_modelpass_regression_summary_20260410.json
  • live_pe_us_data_rebuild_checkpoint_national_irs_other_drilldown_20260410.json

The parity artifact reports headline counts and losses, e.g. n_targets_kept, n_national_targets, n_state_targets, win rates, and aggregate loss metrics, but it does not include the target-level rows. Because the microplex H5 and full target diagnostics are not public, downstream tools cannot produce a truthful full diff between microplex and policyengine-us-data by target or aggregate.

Requested artifact

Please publish a run-level artifact that contains the full per-target comparison against the same calibration target oracle used by policyengine-us-data. Either of these would work:

  1. A full target diagnostics JSON/CSV, committed or otherwise publicly downloadable.
  2. The microplex output H5, publicly downloadable, so downstream services can compute the same diagnostics.

Suggested target diagnostics schema

A row-oriented JSON or CSV would be easiest for dashboards/API consumers. Suggested fields:

  • run_id / artifact_id
  • baseline_dataset, e.g. enhanced_cps_2024.h5
  • candidate_dataset, e.g. microplex
  • target_id
  • variable
  • entity
  • geography / geo_level / state where applicable
  • period
  • target_value
  • us_data_aggregate
  • microplex_aggregate
  • us_data_absolute_error
  • microplex_absolute_error
  • us_data_relative_error
  • microplex_relative_error
  • delta_absolute_error
  • delta_relative_error
  • loss_contribution or equivalent weighted term
  • family / target group
  • in_loss
  • supported_by_microplex

Why this matters

The calibration diagnostics dashboard now has a microplex-vs-us-data page and API endpoint, but it can only show rollups from the current public artifacts. Analysts want to answer questions like:

  • Which exact calibration targets does microplex beat or miss relative to us-data?
  • Which aggregates drive the loss delta?
  • Are failures concentrated in IRS SOI, state AGI distributions, ACA, SNAP, CTC, etc.?
  • For a state/federal reform analysis, which upstream calibration nodes are trustworthy or suspect?

Without target-level rows or an H5, the dashboard has to label the view as aggregate-only and cannot provide a full per-target/per-aggregate diff.

Downstream consumer

This is needed by PolicyEngine/calibration-diagnostics PR #9 and follow-up API work. Once this artifact exists, the dashboard can add an endpoint like /microplex/targets or /microplex/diff to expose the full comparison table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions